Search Results: "mats"

13 July 2022

Reproducible Builds: Reproducible Builds in June 2022

Welcome to the June 2022 report from the Reproducible Builds project. In these reports, we outline the most important things that we have been up to over the past month. As a quick recap, whilst anyone may inspect the source code of free software for malicious flaws, almost all software is distributed to end users as pre-compiled binaries.

Save the date! Despite several delays, we are pleased to announce dates for our in-person summit this year: November 1st 2022 November 3rd 2022
The event will happen in/around Venice (Italy), and we intend to pick a venue reachable via the train station and an international airport. However, the precise venue will depend on the number of attendees. Please see the announcement mail from Mattia Rizzolo, and do keep an eye on the mailing list for further announcements as it will hopefully include registration instructions.

News David Wheeler filed an issue against the Rust programming language to report that builds are not reproducible because full path to the source code is in the panic and debug strings . Luckily, as one of the responses mentions: the --remap-path-prefix solves this problem and has been used to great effect in build systems that rely on reproducibility (Bazel, Nix) to work at all and that there are efforts to teach cargo about it here .
The Python Security team announced that:
The ctx hosted project on PyPI was taken over via user account compromise and replaced with a malicious project which contained runtime code which collected the content of os.environ.items() when instantiating Ctx objects. The captured environment variables were sent as a base64 encoded query parameter to a Heroku application [ ]
As their announcement later goes onto state, version-pinning using hash-checking mode can prevent this attack, although this does depend on specific installations using this mode, rather than a prevention that can be applied systematically.
Developer vanitasvitae published an interesting and entertaining blog post detailing the blow-by-blow steps of debugging a reproducibility issue in PGPainless, a library which aims to make using OpenPGP in Java projects as simple as possible . Whilst their in-depth research into the internals of the .jar may have been unnecessary given that diffoscope would have identified the, it must be said that there is something to be said with occasionally delving into seemingly low-level details, as well describing any debugging process. Indeed, as vanitasvitae writes:
Yes, this would have spared me from 3h of debugging But I probably would also not have gone onto this little dive into the JAR/ZIP format, so in the end I m not mad.

Kees Cook published a short and practical blog post detailing how he uses reproducibility properties to aid work to replace one-element arrays in the Linux kernel. Kees approach is based on the principle that if a (small) proposed change is considered equivalent by the compiler, then the generated output will be identical but only if no other arbitrary or unrelated changes are introduced. Kees mentions the fantastic diffoscope tool, as well as various kernel-specific build options (eg. KBUILD_BUILD_TIMESTAMP) in order to prepare my build with the known to disrupt code layout options disabled .
Stefano Zacchiroli gave a presentation at GDR S curit Informatique based in part on a paper co-written with Chris Lamb titled Increasing the Integrity of Software Supply Chains. (Tweet)

Debian In Debian in this month, 28 reviews of Debian packages were added, 35 were updated and 27 were removed this month adding to our knowledge about identified issues. Two issue types were added: nondeterministic_checksum_generated_by_coq and nondetermistic_js_output_from_webpack. After Holger Levsen found hundreds of packages in the bookworm distribution that lack .buildinfo files, he uploaded 404 source packages to the archive (with no meaningful source changes). Currently bookworm now shows only 8 packages without .buildinfo files, and those 8 are fixed in unstable and should migrate shortly. By contrast, Debian unstable will always have packages without .buildinfo files, as this is how they come through the NEW queue. However, as these packages were not built on the official build servers (ie. they were uploaded by the maintainer) they will never migrate to Debian testing. In the future, therefore, testing should never have packages without .buildinfo files again. Roland Clobus posted yet another in-depth status report about his progress making the Debian Live images build reproducibly to our mailing list. In this update, Roland mentions that all major desktops build reproducibly with bullseye, bookworm and sid but also goes on to outline the progress made with automated testing of the generated images using openQA.

GNU Guix Vagrant Cascadian made a significant number of contributions to GNU Guix: Elsewhere in GNU Guix, Ludovic Court s published a paper in the journal The Art, Science, and Engineering of Programming called Building a Secure Software Supply Chain with GNU Guix:
This paper focuses on one research question: how can [Guix]((https://www.gnu.org/software/guix/) and similar systems allow users to securely update their software? [ ] Our main contribution is a model and tool to authenticate new Git revisions. We further show how, building on Git semantics, we build protections against downgrade attacks and related threats. We explain implementation choices. This work has been deployed in production two years ago, giving us insight on its actual use at scale every day. The Git checkout authentication at its core is applicable beyond the specific use case of Guix, and we think it could benefit to developer teams that use Git.
A full PDF of the text is available.

openSUSE In the world of openSUSE, SUSE announced at SUSECon that they are preparing to meet SLSA level 4. (SLSA (Supply chain Levels for Software Artifacts) is a new industry-led standardisation effort that aims to protect the integrity of the software supply chain.) However, at the time of writing, timestamps within RPM archives are not normalised, so bit-for-bit identical reproducible builds are not possible. Some in-toto provenance files published for SUSE s SLE-15-SP4 as one result of the SLSA level 4 effort. Old binaries are not rebuilt, so only new builds (e.g. maintenance updates) have this metadata added. Lastly, Bernhard M. Wiedemann posted his usual monthly openSUSE reproducible builds status report.

diffoscope diffoscope is our in-depth and content-aware diff utility. Not only can it locate and diagnose reproducibility issues, it can provide human-readable diffs from many kinds of binary formats. This month, Chris Lamb prepared and uploaded versions 215, 216 and 217 to Debian unstable. Chris Lamb also made the following changes:
  • New features:
    • Print profile output if we were called with --profile and we were killed via a TERM signal. This should help in situations where diffoscope is terminated due to some sort of timeout. [ ]
    • Support both PyPDF 1.x and 2.x. [ ]
  • Bug fixes:
    • Also catch IndexError exceptions (in addition to ValueError) when parsing .pyc files. (#1012258)
    • Correct the logic for supporting different versions of the argcomplete module. [ ]
  • Output improvements:
    • Don t leak the (likely-temporary) pathname when comparing PDF documents. [ ]
  • Logging improvements:
    • Update test fixtures for GNU readelf 2.38 (now in Debian unstable). [ ][ ]
    • Be more specific about the minimum required version of readelf (ie. binutils), as it appears that this patch level version change resulted in a change of output, not the minor version. [ ]
    • Use our @skip_unless_tool_is_at_least decorator (NB. at_least) over @skip_if_tool_version_is (NB. is) to fix tests under Debian stable. [ ]
    • Emit a warning if/when we are handling a UNIX TERM signal. [ ]
  • Codebase improvements:
    • Clarify in what situations the main finally block gets called with respect to TERM signal handling. [ ]
    • Clarify control flow in the diffoscope.profiling module. [ ]
    • Correctly package the scripts/ directory. [ ]
In addition, Edward Betts updated a broken link to the RSS on the diffoscope homepage and Vagrant Cascadian updated the diffoscope package in GNU Guix [ ][ ][ ].

Upstream patches The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:

Testing framework The Reproducible Builds project runs a significant testing framework at tests.reproducible-builds.org, to check packages and other artifacts for reproducibility. This month, the following changes were made:
  • Holger Levsen:
    • Add a package set for packages that use the R programming language [ ] as well as one for Rust [ ].
    • Improve package set matching for Python [ ] and font-related [ ] packages.
    • Install the lz4, lzop and xz-utils packages on all nodes in order to detect running kernels. [ ]
    • Improve the cleanup mechanisms when testing the reproducibility of Debian Live images. [ ][ ]
    • In the automated node health checks, deprioritise the generic kernel warning . [ ]
  • Roland Clobus (Debian Live image reproducibility):
    • Add various maintenance jobs to the Jenkins view. [ ]
    • Cleanup old workspaces after 24 hours. [ ]
    • Cleanup temporary workspace and resulting directories. [ ]
    • Implement a number of fixes and improvements around publishing files. [ ][ ][ ]
    • Don t attempt to preserve the file timestamps when copying artifacts. [ ]
And finally, node maintenance was also performed by Mattia Rizzolo [ ].

Mailing list and website On our mailing list this month: Lastly, Chris Lamb updated the main Reproducible Builds website and documentation in a number of small ways, but primarily published an interview with Hans-Christoph Steiner of the F-Droid project. Chris Lamb also added a Coffeescript example for parsing and using the SOURCE_DATE_EPOCH environment variable [ ]. In addition, Sebastian Crane very-helpfully updated the screenshot of salsa.debian.org s request access button on the How to join the Salsa group. [ ]

Contact If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

6 June 2022

Reproducible Builds: Reproducible Builds in May 2022

Welcome to the May 2022 report from the Reproducible Builds project. In our reports we outline the most important things that we have been up to over the past month. As ever, if you are interested in contributing to the project, please visit our Contribute page on our website.

Repfix paper Zhilei Ren, Shiwei Sun, Jifeng Xuan, Xiaochen Li, Zhide Zhou and He Jiang have published an academic paper titled Automated Patching for Unreproducible Builds:
[..] fixing unreproducible build issues poses a set of challenges [..], among which we consider the localization granularity and the historical knowledge utilization as the most significant ones. To tackle these challenges, we propose a novel approach [called] RepFix that combines tracing-based fine-grained localization with history-based patch generation mechanisms.
The paper (PDF, 3.5MB) uses the Debian mylvmbackup package as an example to show how RepFix can automatically generate patches to make software build reproducibly. As it happens, Reiner Herrmann submitted a patch for the mylvmbackup package which has remained unapplied by the Debian package maintainer for over seven years, thus this paper inadvertently underscores that achieving reproducible builds will require both technical and social solutions.

Python variables Johannes Schauer discovered a fascinating bug where simply naming your Python variable _m led to unreproducible .pyc files. In particular, the types module in Python 3.10 requires the following patch to make it reproducible:
--- a/Lib/types.py
+++ b/Lib/types.py
@@ -37,8 +37,8 @@ _ag = _ag()
 AsyncGeneratorType = type(_ag)
 
 class _C:
-    def _m(self): pass
-MethodType = type(_C()._m)
+    def _b(self): pass
+MethodType = type(_C()._b)
Simply renaming the dummy method from _m to _b was enough to workaround the problem. Johannes bug report first led to a number of improvements in diffoscope to aid in dissecting .pyc files, but upstream identified this as caused by an issue surrounding interned strings and is being tracked in CPython bug #78274.

New SPDX team to incorporate build metadata in Software Bill of Materials SPDX, the open standard for Software Bill of Materials (SBOM), is continuously developed by a number of teams and committees. However, SPDX has welcomed a new addition; a team dedicated to enhancing metadata about software builds, complementing reproducible builds in creating a more secure software supply chain. The SPDX Builds Team has been working throughout May to define the universal primitives shared by all build systems, including the who, what, where and how of builds:
  • Who: the identity of the person or organisation that controls the build infrastructure.
  • What: the inputs and outputs of a given build, combining metadata about the build s configuration with an SBOM describing source code and dependencies.
  • Where: the software packages making up the build system, from build orchestration tools such as Woodpecker CI and Tekton to language-specific tools.
  • How: the invocation of a build, linking metadata of a build to the identity of the person or automation tool that initiated it.
The SPDX Builds Team expects to have a usable data model by September, ready for inclusion in the SPDX 3.0 standard. The team welcomes new contributors, inviting those interested in joining to introduce themselves on the SPDX-Tech mailing list.

Talks at Debian Reunion Hamburg Some of the Reproducible Builds team (Holger Levsen, Mattia Rizzolo, Roland Clobus, Philip Rinn, etc.) met in real life at the Debian Reunion Hamburg (official homepage). There were several informal discussions amongst them, as well as two talks related to reproducible builds. First, Holger Levsen gave a talk on the status of Reproducible Builds for bullseye and bookworm and beyond (WebM, 210MB): Secondly, Roland Clobus gave a talk called Reproducible builds as applied to non-compiler output (WebM, 115MB):

Supply-chain security attacks This was another bumper month for supply-chain attacks in package repositories. Early in the month, Lance R. Vick noticed that the maintainer of the NPM foreach package let their personal email domain expire, so they bought it and now controls foreach on NPM and the 36,826 projects that depend on it . Shortly afterwards, Drew DeVault published a related blog post titled When will we learn? that offers a brief timeline of major incidents in this area and, not uncontroversially, suggests that the correct way to ship packages is with your distribution s package manager .

Bootstrapping Bootstrapping is a process for building software tools progressively from a primitive compiler tool and source language up to a full Linux development environment with GCC, etc. This is important given the amount of trust we put in existing compiler binaries. This month, a bootstrappable mini-kernel was announced. Called boot2now, it comprises a series of compilers in the form of bootable machine images.

Google s new Assured Open Source Software service Google Cloud (the division responsible for the Google Compute Engine) announced a new Assured Open Source Software service. Noting the considerable 650% year-over-year increase in cyberattacks aimed at open source suppliers, the new service claims to enable enterprise and public sector users of open source software to easily incorporate the same OSS packages that Google uses into their own developer workflows . The announcement goes on to enumerate that packages curated by the new service would be:
  • Regularly scanned, analyzed, and fuzz-tested for vulnerabilities.
  • Have corresponding enriched metadata incorporating Container/Artifact Analysis data.
  • Are built with Cloud Build including evidence of verifiable SLSA-compliance
  • Are verifiably signed by Google.
  • Are distributed from an Artifact Registry secured and protected by Google.
(Full announcement)

A retrospective on the Rust programming language Andrew bunnie Huang published a long blog post this month promising a critical retrospective on the Rust programming language. Amongst many acute observations about the evolution of the language s syntax (etc.), the post beings to critique the languages approach to supply chain security ( Rust Has A Limited View of Supply Chain Security ) and reproducibility ( You Can t Reproduce Someone Else s Rust Build ):
There s some bugs open with the Rust maintainers to address reproducible builds, but with the number of issues they have to deal with in the language, I am not optimistic that this problem will be resolved anytime soon. Assuming the only driver of the unreproducibility is the inclusion of OS paths in the binary, one fix to this would be to re-configure our build system to run in some sort of a chroot environment or a virtual machine that fixes the paths in a way that almost anyone else could reproduce. I say almost anyone else because this fix would be OS-dependent, so we d be able to get reproducible builds under, for example, Linux, but it would not help Windows users where chroot environments are not a thing.
(Full post)

Reproducible Builds IRC meeting The minutes and logs from our May 2022 IRC meeting have been published. In case you missed this one, our next IRC meeting will take place on Tuesday 28th June at 15:00 UTC on #reproducible-builds on the OFTC network.

A new tool to improve supply-chain security in Arch Linux kpcyrd published yet another interesting tool related to reproducibility. Writing about the tool in a recent blog post, kpcyrd mentions that although many PKGBUILDs provide authentication in the context of signed Git tags (i.e. the ability to verify the Git tag was signed by one of the two trusted keys ), they do not support pinning, ie. that upstream could create a new signed Git tag with an identical name, and arbitrarily change the source code without the [maintainer] noticing . Conversely, other PKGBUILDs support pinning but not authentication. The new tool, auth-tarball-from-git, fixes both problems, as nearly outlined in kpcyrd s original blog post.

diffoscope diffoscope is our in-depth and content-aware diff utility. Not only can it locate and diagnose reproducibility issues, it can provide human-readable diffs from many kinds of binary formats. This month, Chris Lamb prepared and uploaded versions 212, 213 and 214 to Debian unstable. Chris also made the following changes:
  • New features:
    • Add support for extracting vmlinuz Linux kernel images. [ ]
    • Support both python-argcomplete 1.x and 2.x. [ ]
    • Strip sticky etc. from x.deb: sticky Debian binary package [ ]. [ ]
    • Integrate test coverage with GitLab s concept of artifacts. [ ][ ][ ]
  • Bug fixes:
    • Don t mask differences in .zip or .jar central directory extra fields. [ ]
    • Don t show a binary comparison of .zip or .jar files if we have observed at least one nested difference. [ ]
  • Codebase improvements:
    • Substantially update comment for our calls to zipinfo and zipinfo -v. [ ]
    • Use assert_diff in test_zip over calling get_data with a separate assert. [ ]
    • Don t call re.compile and then call .sub on the result; just call re.sub directly. [ ]
    • Clarify the comment around the difference between --usage and --help. [ ]
  • Testsuite improvements:
    • Test --help and --usage. [ ]
    • Test that --help includes the file formats. [ ]
Vagrant Cascadian added an external tool reference xb-tool for GNU Guix [ ] as well as updated the diffoscope package in GNU Guix itself [ ][ ][ ].

Distribution work In Debian, 41 reviews of Debian packages were added, 85 were updated and 13 were removed this month adding to our knowledge about identified issues. A number of issue types have been updated, including adding a new nondeterministic_ordering_in_deprecated_items_collected_by_doxygen toolchain issue [ ] as well as ones for mono_mastersummary_xml_files_inherit_filesystem_ordering [ ], extended_attributes_in_jar_file_created_without_manifest [ ] and apxs_captures_build_path [ ]. Vagrant Cascadian performed a rough check of the reproducibility of core package sets in GNU Guix, and in openSUSE, Bernhard M. Wiedemann posted his usual monthly reproducible builds status report.

Upstream patches The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:

Reproducible builds website Chris Lamb updated the main Reproducible Builds website and documentation in a number of small ways, but also prepared and published an interview with Jan Nieuwenhuizen about Bootstrappable Builds, GNU Mes and GNU Guix. [ ][ ][ ][ ] In addition, Tim Jones added a link to the Talos Linux project [ ] and billchenchina fixed a dead link [ ].

Testing framework The Reproducible Builds project runs a significant testing framework at tests.reproducible-builds.org, to check packages and other artifacts for reproducibility. This month, the following changes were made:
  • Holger Levsen:
    • Add support for detecting running kernels that require attention. [ ]
    • Temporarily configure a host to support performing Debian builds for packages that lack .buildinfo files. [ ]
    • Update generated webpages to clarify wishes for feedback. [ ]
    • Update copyright years on various scripts. [ ]
  • Mattia Rizzolo:
    • Provide a facility so that Debian Live image generation can copy a file remotely. [ ][ ][ ][ ]
  • Roland Clobus:
    • Add initial support for testing generated images with OpenQA. [ ]
And finally, as usual, node maintenance was also performed by Holger Levsen [ ][ ].

Misc news On our mailing list this month:

Contact If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

20 May 2022

Louis-Philippe V ronneau: Introducing metalfinder

After going to an incredible Arch Enemy / Behemoth / Napalm Death / Unto Others concert a few weeks ago, I decided I wanted to go to more concerts. I like music, and I really enjoy concerts. Sadly, I often miss great performances because no one told me about it, or my local newspaper didn't cover the event enough in advance for me to get tickets. Some online services lets you sync your Spotify account to notify you when a new concert is announced, but I don't use Spotify. As a music geek, I have a local music collection and if I need to stream it, I have a supysonic server. Introducing metalfinder, a cli tool to find concerts using your local music collection! At the moment, it scans your music collection, creates a list of artists and queries Bandsintown for concerts in your town. Multiple output formats are supported, but I mainly use the ATOM one, as I'm a heavy feed reader user. Screenshot of the ATOM output in my feed reader The current metalfinder version (1.1.1) is a MVP: it works well enough, but I still have a lot of work to do... If you want to give it a try, the easiest way is to download it from PyPi. metalfinder is also currently in NEW and I'm planning to have something feature complete in time for the Bookworm freeze.

5 May 2022

Reproducible Builds: Reproducible Builds in April 2022

Welcome to the April 2022 report from the Reproducible Builds project! In these reports, we try to summarise the most important things that we have been up to over the past month. If you are interested in contributing to the project, please take a few moments to visit our Contribute page on our website.

News Cory Doctorow published an interesting article this month about the possibility of Undetectable backdoors for machine learning models. Given that machine learning models can provide unpredictably incorrect results, Doctorow recounts that there exists another category of adversarial examples that comprise a gimmicked machine-learning input that, to the human eye, seems totally normal but which causes the ML system to misfire dramatically that permit the possibility of planting undetectable back doors into any machine learning system at training time .
Chris Lamb published two supporter spotlights on our blog: the first about Amateur Radio Digital Communications (ARDC) and the second about the Google Open Source Security Team (GOSST).
Piergiorgio Ladisa, Henrik Plate, Matias Martinez and Olivier Barais published a new academic paper titled A Taxonomy of Attacks on Open-Source Software Supply Chains (PDF):
This work proposes a general taxonomy for attacks on open-source supply chains, independent of specific programming languages or ecosystems, and covering all supply chain stages from code contributions to package distribution. Taking the form of an attack tree, it covers 107 unique vectors, linked to 94 real-world incidents, and mapped to 33 mitigating safeguards.

Elsewhere in academia, Ly Vu Duc published his PhD thesis. Titled Towards Understanding and Securing the OSS Supply Chain (PDF), Duc s abstract reads as follows:
This dissertation starts from the first link in the software supply chain, developers . Since many developers do not update their vulnerable software libraries, thus exposing the user of their code to security risks. To understand how they choose, manage and update the libraries, packages, and other Open-Source Software (OSS) that become the building blocks of companies completed products consumed by end-users, twenty-five semi-structured interviews were conducted with developers of both large and small-medium enterprises in nine countries. All interviews were transcribed, coded, and analyzed according to applied thematic analysis

Upstream news Filippo Valsorda published an informative blog post recently called How Go Mitigates Supply Chain Attacks outlining the high-level features of the Go ecosystem that helps prevent various supply-chain attacks.
There was new/further activity on a pull request filed against openssl by Sebastian Andrzej Siewior in order to prevent saved CFLAGS (which may contain the -fdebug-prefix-map=<PATH> flag that is used to strip an arbitrary the build path from the debug info if this information remains recorded then the binary is no longer reproducible if the build directory changes.

Events The Linux Foundation s SupplyChainSecurityCon, will take place June 21st 24th 2022, both virtually and in Austin, Texas. Long-time Reproducible Builds and openSUSE contributor Bernhard M. Wiedemann learned that he had his talk accepted, and will speak on Reproducible Builds: Unexpected Benefits and Problems on June 21st.
There will be an in-person Debian Reunion in Hamburg, Germany later this year, taking place from 23 30 May. Although this is a Debian event, there will be some folks from the broader Reproducible Builds community and, of course, everyone is welcome. Please see the event page on the Debian wiki for more information. 41 people have registered so far, and there s approx 10 on-site beds still left.
The minutes and logs from our April 2022 IRC meeting have been published. In case you missed this one, our next IRC meeting will take place on May 31st at 15:00 UTC on #reproducible-builds on the OFTC network.

Debian Roland Clobus wrote another in-depth status update about the status of live Debian images, summarising the current situation that all major desktops build reproducibly with bullseye, bookworm and sid, including the Cinnamon desktop on bookworm and sid, but at a small functionality cost: 14 words will be incorrectly abbreviated . This work incorporated:
  • Reporting an issue about unnecessarily modified timestamps in the daily Debian installer images. [ ]
  • Reporting a bug against the debian-installer: in order to use a suitable kernel version. (#1006800)
  • Reporting a bug in: texlive-binaries regarding the unreproducible content of .fmt files. (#1009196)
  • Adding hacks to make the Cinnamon desktop image reproducible in bookworm and sid. [ ]
  • Added a script to rebuild a live-build ISO image from a given timestamp. [
  • etc.
On our mailing list, Venkata Pyla started a thread on the Debian debconf cache is non-reproducible issue while creating system images and Vagrant Cascadian posted an excellent summary of the reproducibility status of core package sets in Debian and solicited for similar information from other distributions.
Lastly, 122 reviews of Debian packages were added, 44 were updated and 193 were removed this month adding to our extensive knowledge about identified issues. A number of issue types have been updated as well, including timestamps_generated_by_hevea, randomness_in_ocaml_preprocessed_files, build_path_captured_in_emacs_el_file, golang_compiler_captures_build_path_in_binary and build_path_captured_in_assembly_objects,

Other distributions Happy birthday to GNU Guix, which recently turned 10 years old! People have been sharing their stories, in which reproducible builds and bootstrappable builds are a recurring theme as a feature important to its users and developers. The experiences are available on the GNU Guix blog as well as a post on fossandcrafts.org
In openSUSE, Bernhard M. Wiedemann posted his usual monthly reproducible builds status report.

Upstream patches The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:

diffoscope diffoscope is our in-depth and content-aware diff utility. Not only can it locate and diagnose reproducibility issues, it can provide human-readable diffs from many kinds of binary formats. This month, Chris Lamb prepared and uploaded versions 210 and 211 to Debian unstable, as well as noticed that some Python .pyc files are reported as data, so we should support .pyc as a fallback filename extension [ ]. In addition, Mattia Rizzolo disabled the Gnumeric tests in Debian as the package is not currently available [ ] and dropped mplayer from Build-Depends too [ ]. In addition, Mattia fixed an issue to ensure that the PATH environment variable is properly modified for all actions, not just when running the comparator. [ ]

Testing framework The Reproducible Builds project runs a significant testing framework at tests.reproducible-builds.org, to check packages and other artifacts for reproducibility. This month, the following changes were made:
  • Daniel Golle:
    • Prefer a different solution to avoid building all OpenWrt packages; skip packages from optional community feeds. [ ]
  • Holger Levsen:
    • Detect Python deprecation warnings in the node health check. [ ]
    • Detect failure to build the Debian Installer. [ ]
  • Mattia Rizzolo:
    • Install disorderfs for building OpenWrt packages. [ ]
  • Paul Spooren (OpenWrt-related changes):
    • Don t build all packages whilst the core packages are not yet reproducible. [ ]
    • Add a missing RUN directive to node_cleanup. [ ]
    • Be less verbose during a toolchain build. [ ]
    • Use disorderfs for rebuilds and update the documentation to match. [ ][ ][ ]
  • Roland Clobus:
    • Publish the last reproducible Debian ISO image. [ ]
    • Use the rebuild.sh script from the live-build package. [ ]
Lastly, node maintenance was also performed by Holger Levsen [ ][ ].
If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

8 April 2022

Reproducible Builds: Reproducible Builds in March 2022

Welcome to the March 2022 report from the Reproducible Builds project! In our monthly reports we outline the most important things that we have been up to over the past month.
The in-toto project was accepted as an incubating project within the Cloud Native Computing Foundation (CNCF). in-toto is a framework that protects the software supply chain by collecting and verifying relevant data. It does so by enabling libraries to collect information about software supply chain actions and then allowing software users and/or project managers to publish policies about software supply chain practices that can be verified before deploying or installing software. CNCF foundations hosts a number of critical components of the global technology infrastructure under the auspices of the Linux Foundation. (View full announcement.)
Herv Boutemy posted to our mailing list with an announcement that the Java Reproducible Central has hit the milestone of 500 fully reproduced builds of upstream projects . Indeed, at the time of writing, according to the nightly rebuild results, 530 releases were found to be fully reproducible, with 100% reproducible artifacts.
GitBOM is relatively new project to enable build tools trace every source file that is incorporated into build artifacts. As an experiment and/or proof-of-concept, the GitBOM developers are rebuilding Debian to generate side-channel build metadata for versions of Debian that have already been released. This only works because Debian is (partial) reproducible, so one can be sure that that, if the case where build artifacts are identical, any metadata generated during these instrumented builds applies to the binaries that were built and released in the past. More information on their approach is available in README file in the bomsh repository.
Ludovic Courtes has published an academic paper discussing how the performance requirements of high-performance computing are not (as usually assumed) at odds with reproducible builds. The received wisdom is that vendor-specific libraries and platform-specific CPU extensions have resulted in a culture of local recompilation to ensure the best performance, rendering the property of reproducibility unobtainable or even meaningless. In his paper, Ludovic explains how Guix has:
[ ] implemented what we call package multi-versioning for C/C++ software that lacks function multi-versioning and run-time dispatch [ ]. It is another way to ensure that users do not have to trade reproducibility for performance. (full PDF)

Kit Martin posted to the FOSSA blog a post titled The Three Pillars of Reproducible Builds. Inspired by the shock of infiltrated or intentionally broken NPM packages, supply chain attacks, long-unnoticed backdoors , the post goes on to outline the high-level steps that lead to a reproducible build:
It is one thing to talk about reproducible builds and how they strengthen software supply chain security, but it s quite another to effectively configure a reproducible build. Concrete steps for specific languages are a far larger topic than can be covered in a single blog post, but today we ll be talking about some guiding principles when designing reproducible builds. [ ]
The article was discussed on Hacker News.
Finally, Bernhard M. Wiedemann noticed that the GNU Helloworld project varies depending on whether it is being built during a full moon! (Reddit announcement, openSUSE bug report)

Events There will be an in-person Debian Reunion in Hamburg, Germany later this year, taking place from 23 30 May. Although this is a Debian event, there will be some folks from the broader Reproducible Builds community and, of course, everyone is welcome. Please see the event page on the Debian wiki for more information. Bernhard M. Wiedemann posted to our mailing list about a meetup for Reproducible Builds folks at the openSUSE conference in Nuremberg, Germany. It was also recently announced that DebConf22 will take place this year as an in-person conference in Prizren, Kosovo. The pre-conference meeting (or Debcamp ) will take place from 10 16 July, and the main talks, workshops, etc. will take place from 17 24 July.

Misc news Holger Levsen updated the Reproducible Builds website to improve the documentation for the SOURCE_DATE_EPOCH environment variable, both by expanding parts of the existing text [ ][ ] as well as clarifying meaning by removing text in other places [ ]. In addition, Chris Lamb added a Twitter Card to our website s metadata too [ ][ ][ ]. On our mailing list this month:

Distribution work In Debian this month:
  • Johannes Schauer Marin Rodrigues posted to the debian-devel list mentioning that he exploited the property of reproducibility within Debian to demonstrate that automatically converting a large number of packages to a new internal source version did not change the resulting packages. The proposed change could therefore be applied without causing breakage:
So now we have 364 source packages for which we have a patch and for which we can show that this patch does not change the build output. Do you agree that with those two properties, the advantages of the 3.0 (quilt) format are sufficient such that the change shall be implemented at least for those 364? [ ]
In openSUSE, Bernhard M. Wiedemann posted his usual monthly reproducible builds status report.

Tooling diffoscope is our in-depth and content-aware diff utility. Not only can it locate and diagnose reproducibility issues, it can provide human-readable diffs from many kinds of binary formats. This month, Chris Lamb prepared and uploaded versions 207, 208 and 209 to Debian unstable, as well as made the following changes to the code itself:
  • Update minimum version of Black to prevent test failure on Ubuntu jammy. [ ]
  • Updated the R test fixture for the 4.2.x series of the R programming language. [ ]
Brent Spillner also worked on adding graceful handling for UNIX sockets and named pipes to diffoscope. [ ][ ][ ]. Vagrant Cascadian also updated the diffoscope package in GNU Guix. [ ][ ] reprotest is the Reproducible Build s project end-user tool to build the same source code twice in widely different environments and checking whether the binaries produced by the builds have any differences. This month, Santiago Ruano Rinc n added a new --append-build-command option [ ], which was subsequently uploaded to Debian unstable by Holger Levsen.

Upstream patches The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:

Testing framework The Reproducible Builds project runs a significant testing framework at tests.reproducible-builds.org, to check packages and other artifacts for reproducibility. This month, the following changes were made:
  • Holger Levsen:
    • Replace a local copy of the dsa-check-running-kernel script with a packaged version. [ ]
    • Don t hide the status of offline hosts in the Jenkins shell monitor. [ ]
    • Detect undefined service problems in the node health check. [ ]
    • Update the sources.lst file for our mail server as its still running Debian buster. [ ]
    • Add our mail server to our node inventory so it is included in the Jenkins maintenance processes. [ ]
    • Remove the debsecan package everywhere; it got installed accidentally via the Recommends relation. [ ]
    • Document the usage of the osuosl174 host. [ ]
Regular node maintenance was also performed by Holger Levsen [ ], Vagrant Cascadian [ ][ ][ ] and Mattia Rizzolo.
If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

20 March 2022

Joerg Jaspert: Another shell script moved to rust

Shell? Rust! Not the first shell script I took and made a rust version of, but probably my largest yet. This time I took my little tm (tmux helper) tool which is (well, was) a bit more than 600 lines of shell, and converted it to Rust. I got most of the functionality done now, only one major part is missing.

What s tm? tm started as a tiny shell script to make handling tmux easier. The first commit in git was in July 2013, but I started writing and using it in 2011. It started out as a kind-of wrapper around ssh, opening tmux windows with an ssh session on some other hosts. It quickly gained support to open multiple ssh sessions in one window, telling tmux to synchronize input (send input to all targets at once), which is great when you have a set of machines that ought to get the same commands.

tm vs clusterssh / mussh In spirit it is similar to clusterssh or mussh, allowing to run the same command on many hosts at the same time. clusterssh sets out to open new terminals (xterm) per host and gives you an input line, that it sends everywhere. mussh appears to take your command and then send it to all the hosts. Both have disadvantages in my opinion: clusterssh opens lots of xterm windows, and you can not easily switch between multiple sessions, mussh just seems to send things over ssh and be done. tm instead just creates a tmux session, telling it to ssh to the targets, possibly setting the tmux option to send input to all panes. And leaves all the rest of the handling to tmux. So you can
  • detach a session and reattach later easily,
  • use tmux great builtin support for copy/paste,
  • see all output, modify things even for one machine only,
  • zoom in to one machine that needs just ONE bit different (cssh can do this too),
  • let colleagues also connect to your tmux session, when needed,
  • easily add more machines to the mix, if needed,
  • and all the other extra features tmux brings.

More tm tm also supports just attaching to existing sessions as well as killing sessions, mostly for lazyness (less to type than using tmux directly). At some point tm gained support for setting up sessions according to some session file . It knows two formats now, one is simple and mostly a list of hostnames to open synchronized sessions for. This may contain LIST commands, which let tm execute that command, expected output is list of hostnames (or more LIST commands) for the session. That, combined with the replacement part, lets us have one config file that opens a set of VMs based on tags our Ganeti runs, based on tags. It is simply a LIST command asking for VMs tagged with the replacement arg and up. Very handy. Or also all VMs on host X . The second format is basically free form tmux commands . Mostly commandline tmux call, just drop the tmux in front collection. Both of them supporting a crude variable replacement.

Conversion to Rust Some while ago I started playing with Rust and it somehow clicked , I do like it. My local git tells me, that I tried starting off with go in 2017, but that appearently did not work out. Fun, everywhere I can read says that Rust ought to be harder to learn. So by now I have most of the functionality implemented in the Rust version, even if I am sure that the code isn t a good Rust example. I m learning, after all, and already have adjusted big parts of it, multiple times, whenever I learn (and understand) something more - and am also sure that this will happen again

Compatibility with old tm It turns out that my goal of staying compatible with the behaviour of the old shell script does make some things rather complicated. For example, the LIST commands in session config files - in shell I just execute them commands, and shell deals with variable/parameter expansion, I just set IFS to newline only and read in what I get back. Simple. Because shell is doing a lot of things for me. Now, in Rust, it is a different thing at all:
  • Properly splitting the line into shell words, taking care of quoting (can t simply take whitespace) (there is shlex)
  • Expanding specials like ~ and $HOME (there is home_dir).
  • Supporting environment variables in general, tm has some that adjust behaviour of it. Which shell can use globally. Used lazy_static for a similar effect - they aren t going to change at runtime ever, anyways.
Properly supporting the commandline arguments also turned out to be a bit more work. Rust appearently has multiple crates supporting this, I settled on clap, but as tm supports getopts -style as well as free-form arguments (subcommands in clap), it takes a bit to get that interpreted right.

Speed Most of the time entirely unimportant in the tool that tm is (open a tmux with one to some ssh connections to some places is not exactly hard or time consuming), there are situations, where one can notice that it s calling out to tmux over and over again, for every single bit to do, and that just takes time: Configurations that open sessions to 20 and more hosts at the same time especially lag in setup time. (My largest setup goes to 443 panes in one window). The compiled Rust version is so much faster there, it s just great. Nice side effect, that is. And yes, in the end it is also only driving tmux, still, it takes less than half the time to do so.

Code, Fun parts As this is still me learning to write Rust, I am sure the code has lots to improve. Some of which I will sure find on my own, but if you have time, I love PRs (or just mails with hints).

Github Also the first time I used Github Actions to see how it goes. Letting it build, test, run clippy and also run a code coverage tool (Yay, more than 50% covered ) on it. Unsure my tests are good, I am not used to writing tests for code, but hey, coverage!

Up next I do have to implement the last missing feature, which is reading the other config file format. A little scared, as that means somehow translating those lines into correct calls within the tmux_interface I am using, not sure that is easy. I could be bad and just shell out to tmux on it all the time, but somehow I don t like the thought of doing that. Maybe (ab)using the control mode, but then, why would I use tmux_interface, so trying to handle it with that first. Afterwards I want to gain a new command, to save existing sessions and be able to recreate them easily. Shouldn t be too hard, tmux has a way to get at that info, somewhere.

9 March 2022

Jonathan Dowland: Broken webcam aspect ratio

picture of my Sony RX100-III camera Sony RX100-III, relegated to a webcam
Sometimes I have remote meetings with Google Meet. Unlike the other video-conferencing services that I use (Bluejeans, Zoom), my video was stretched out of proportion under Google Meet with Firefox. I haven't found out why this was happening, but I did figure out a work-around. Thanks to Daniel Silverstone, Rob Kendrick, Gregor Herrmann and Ben Allen for pointing me in the right direction! Hardware The lovely Sony RX-100 mk3 that I bought in 2015 has spent most of its life languishing unused. During the Pandemic, once I was working from home all the time, I decided to press-gang it into service as a better-quality webcam. Newer models of this camera the mark 4 onwards have support for a USB mode called "PC Remote", which effectively makes them into webcams. Unfortunately my mark 3 does not support this, but it does have HDMI out, so I picked up a cheap "HDMI to USB Video Capture Card" from eBay. Video modes
Before: wrong aspect ratio Before: wrong aspect ratio
This device offers a selection of different video modes over a webcam interface. I used qv4l2 to explore the different modes. It became clear that the camera was outputting a signal at 16:9, but the modes on offer from the dongle were for a range of different aspect ratios. The picture for these other ratios was not letter or pillar-boxed, but stretched to fit. I also noticed that the modes which had the correct aspect ratio were at very low framerates: 1920x1080@5fps, 1360x768@8fps, 1280x720@10fps. It felt to me that I would look unnatural at such a low framerate. The most promising mode was close to the right ratio, 720x480 and 30 fps. Software
After: corrected aspect ratio After: corrected aspect ratio
My initial solution is to use the v4l2loopback kernel module, which provides a virtual loop-back webcam interface. I can write video data to it from one process, and read it back from another. Loading it as follows:
modprobe v4l2loopback exclusive_caps=1
The option exclusive_caps configures the module into a mode where it initially presents a write-only interface, but once a process has opened a file handle, it then switches to read-only for subsequent processes. Assuming there are no other camera devices connected at the time of loading the module, it will create /dev/video0.1 I experimented briefly with OBS Studio, the very versatile and feature-full streaming tool, which confirmed that I could use filters on the source video to fix the aspect ratio, and emit the result to the virtual device. I don't otherwise use OBS, though, so I achieve the same result using ffmpeg:
fmpeg -s 720x480 -i /dev/video1 -r 30 -f v4l2 -vcodec rawvideo \
    -pix_fmt yuyv422 -s 720x405 /dev/video0
The source options are to select the source video mode I want. The codec and pixel formats are to match what is being emitted (I determined that using ffprobe on the camera device). The resizing is triggered by supplying a different size to the -s parameter. I think that is equivalent to explicitly selecting a "scale" filter, and there might be other filters that could be used instead (to add pillar boxes for example). This worked just as well. In Google Meet, I select the Virtual Camera, and Google Meet is presented with only one video mode, in the correct aspect ratio, and no configurable options for it, so it can't misbehave. Future I'm planning to automate the loading (and unloading) of the module and starting the ffmpeg process in response to the real camera device being plugged or unplugged, using systemd events and services. (I don't leave the camera plugged in all the time due to some bad USB behaviour I've experienced if I do so.) If I get that working, I will write a follow-up.

  1. you can request a specific device name/number with another module option.

5 March 2022

Reproducible Builds: Reproducible Builds in February 2022

Welcome to the February 2022 report from the Reproducible Builds project. In these reports, we try to round-up the important things we and others have been up to over the past month. As ever, if you are interested in contributing to the project, please visit our Contribute page on our website.
Jiawen Xiong, Yong Shi, Boyuan Chen, Filipe R. Cogo and Zhen Ming Jiang have published a new paper titled Towards Build Verifiability for Java-based Systems (PDF). The abstract of the paper contains the following:
Various efforts towards build verifiability have been made to C/C++-based systems, yet the techniques for Java-based systems are not systematic and are often specific to a particular build tool (eg. Maven). In this study, we present a systematic approach towards build verifiability on Java-based systems.

GitBOM is a flexible scheme to track the source code used to generate build artifacts via Git-like unique identifiers. Although the project has been active for a while, the community around GitBOM has now started running weekly community meetings.
The paper Chris Lamb and Stefano Zacchiroli is now available in the March/April 2022 issue of IEEE Software. Titled Reproducible Builds: Increasing the Integrity of Software Supply Chains (PDF), the abstract of the paper contains the following:
We first define the problem, and then provide insight into the challenges of making real-world software build in a reproducible manner-this is, when every build generates bit-for-bit identical results. Through the experience of the Reproducible Builds project making the Debian Linux distribution reproducible, we also describe the affinity between reproducibility and quality assurance (QA).

In openSUSE, Bernhard M. Wiedemann posted his monthly reproducible builds status report.
On our mailing list this month, Thomas Schmitt started a thread around the SOURCE_DATE_EPOCH specification related to formats that cannot help embedding potentially timezone-specific timestamp. (Full thread index.)
The Yocto Project is pleased to report that it s core metadata (OpenEmbedded-Core) is now reproducible for all recipes (100% coverage) after issues with newer languages such as Golang were resolved. This was announced in their recent Year in Review publication. It is of particular interest for security updates so that systems can have specific components updated but reducing the risk of other unintended changes and making the sections of the system changing very clear for audit. The project is now also making heavy use of equivalence of build output to determine whether further items in builds need to be rebuilt or whether cached previously built items can be used. As mentioned in the article above, there are now public servers sharing this equivalence information. Reproducibility is key in making this possible and effective to reduce build times/costs/resource usage.

diffoscope diffoscope is our in-depth and content-aware diff utility. Not only can it locate and diagnose reproducibility issues, it can provide human-readable diffs from many kinds of binary formats. This month, Chris Lamb prepared and uploaded versions 203, 204, 205 and 206 to Debian unstable, as well as made the following changes to the code itself:
  • Bug fixes:
    • Fix a file(1)-related regression where Debian .changes files that contained non-ASCII text were not identified as such, therefore resulting in seemingly arbitrary packages not actually comparing the nested files themselves. The non-ASCII parts were typically in the Maintainer or in the changelog text. [ ][ ]
    • Fix a regression when comparing directories against non-directories. [ ][ ]
    • If we fail to scan using binwalk, return False from BinwalkFile.recognizes. [ ]
    • If we fail to import binwalk, don t report that we are missing the Python rpm module! [ ]
  • Testsuite improvements:
    • Add a test for recent file(1) issue regarding .changes files. [ ]
    • Use our assert_diff utility where we can within the test_directory.py set of tests. [ ]
    • Don t run our binwalk-related tests as root or fakeroot. The latest version of binwalk has some new security protection against this. [ ]
  • Codebase improvements:
    • Drop the _PATH suffix from module-level globals that are not paths. [ ]
    • Tidy some control flow in Difference._reverse_self. [ ]
    • Don t print a warning to the console regarding NT_GNU_BUILD_ID changes. [ ]
In addition, Mattia Rizzolo updated the Debian packaging to ensure that diffoscope and diffoscope-minimal packages have the same version. [ ]

Website updates There were quite a few changes to the Reproducible Builds website and documentation this month as well, including:
  • Chris Lamb:
    • Considerably rework the Who is involved? page. [ ][ ]
    • Move the contributors.sh Bash/shell script into a Python script. [ ][ ][ ]
  • Daniel Shahaf:
    • Try a different Markdown footnote content syntax to work around a rendering issue. [ ][ ][ ]
  • Holger Levsen:
    • Make a huge number of changes to the Who is involved? page, including pre-populating a large number of contributors who cannot be identified from the metadata of the website itself. [ ][ ][ ][ ][ ]
    • Improve linking to sponsors in sidebar navigation. [ ]
    • drop sponsors paragraph as the navigation is clearer now. [ ]
    • Add Mullvad VPN as a bronze-level sponsor . [ ][ ]
  • Vagrant Cascadian:

Upstream patches The Reproducible Builds project attempts to fix as many currently-unreproducible packages as possible. February s patches included the following:

Testing framework The Reproducible Builds project runs a significant testing framework at tests.reproducible-builds.org, to check packages and other artifacts for reproducibility. This month, the following changes were made:
  • Daniel Golle:
    • Update the OpenWrt configuration to not depend on the host LLVM, adding lines to the .config seed to build LLVM for eBPF from source. [ ]
    • Preserve more OpenWrt-related build artifacts. [ ]
  • Holger Levsen:
  • Temporary use a different Git tree when building OpenWrt as our tests had been broken since September 2020. This was reverted after the patch in question was accepted by Paul Spooren into the canonical openwrt.git repository the next day.
    • Various improvements to debugging OpenWrt reproducibility. [ ][ ][ ][ ][ ]
    • Ignore useradd warnings when building packages. [ ]
    • Update the script to powercycle armhf architecture nodes to add a hint to where nodes named virt-*. [ ]
    • Update the node health check to also fix failed logrotate and man-db services. [ ]
  • Mattia Rizzolo:
    • Update the website job after contributors.sh script was rewritten in Python. [ ]
    • Make sure to set the DIFFOSCOPE environment variable when available. [ ]
  • Vagrant Cascadian:
    • Various updates to the diffoscope timeouts. [ ][ ][ ]
Node maintenance was also performed by Holger Levsen [ ] and Vagrant Cascadian [ ].

Finally If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

3 March 2022

Joerg Jaspert: Scan for SSH private keys without passphrase

SSH private key scanner (keys without passphrase) So for policy reasons, customer wanted to ensure that every SSH private key in use by a human on their systems has a passphrase set. And asked us to make sure this is the case. There is no way in SSH to check this during connection, so client side needs to be looked at. Which means looking at actual files on the system. Turns out there are multiple formats for the private keys - and I really do not want to implement something able to deal with that on my own. OpenSSH to the rescue, it ships a little tool ssh-keygen, most commonly known for its ability to generate SSH keys. But it can do much more with keys. One action is interesting here for our case: The ability to print out the public key to a given private key. For a key that is unprotected, this will just work. A key with a passphrase instead leads to it asking you for one. So we have our way to check if a key is protected by a passphrase. Now we only need to find all possible keys (note, the requirement is not keys in .ssh/ , but all possible, so we need to scan for them. But we do not want to run ssh-keygen on just any file, we would like to do it when we are halfway sure, that it is actually a key. Well, turns out, even though SSH has multiple formats, they all appear to have the string PRIVATE KEY somewhere very early (usually first line). And they are tiny - even a 16384bit RSA key is just above 12000 bytes long. Lets find every file thats less then 13000 bytes and has the magic string in it, and throw it at ssh-keygen - if we get a public key back, flag it. Also, we supply a random (ohwell, hardcoded) passphrase, to avoid it prompting for any. Scanning the whole system, one will find quite a surprising number of unprotected SSH keys. Well, better description possibly Unprotected RSA private keys , so the output does need to be checked by a human. This, of course, can be done in shell, quite simple. So i wrote some Rust code instead, as I am still on my task to try and learn more of it. If you are interested, you can find sshprivscanner and play with it, patches/fixes/whatever welcome.

5 February 2022

Reproducible Builds: Reproducible Builds in January 2022

Welcome to the January 2022 report from the Reproducible Builds project. In our reports, we try outline the most important things that have been happening in the past month. As ever, if you are interested in contributing to the project, please visit our Contribute page on our website.
An interesting blog post was published by Paragon Initiative Enterprises about Gossamer, a proposal for securing the PHP software supply-chain. Utilising code-signing and third-party attestations, Gossamer aims to mitigate the risks within the notorious PHP world via publishing attestations to a transparency log. Their post, titled Solving Open Source Supply Chain Security for the PHP Ecosystem goes into some detail regarding the design, scope and implementation of the system.
This month, the Linux Foundation announced SupplyChainSecurityCon, a conference focused on exploring the security threats affecting the software supply chain, sharing best practices and mitigation tactics. The conference is part of the Linux Foundation s Open Source Summit North America and will take place June 21st 24th 2022, both virtually and in Austin, Texas.

Debian There was a significant progress made in the Debian Linux distribution this month, including:

Other distributions kpcyrd reported on Twitter about the release of version 0.2.0 of pacman-bintrans, an experiment with binary transparency for the Arch Linux package manager, pacman. This new version is now able to query rebuilderd to check if a package was independently reproduced.
In the world of openSUSE, however, Bernhard M. Wiedemann posted his monthly reproducible builds status report.

diffoscope diffoscope is our in-depth and content-aware diff utility. Not only can it locate and diagnose reproducibility issues, it can provide human-readable diffs from many kinds of binary formats. This month, Chris Lamb prepared and uploaded versions 199, 200, 201 and 202 to Debian unstable (that were later backported to Debian bullseye-backports by Mattia Rizzolo), as well as made the following changes to the code itself:
  • New features:
    • First attempt at incremental output support with a timeout. Now passing, for example, --timeout=60 will mean that diffoscope will not recurse into any sub-archives after 60 seconds total execution time has elapsed. Note that this is not a fixed/strict timeout due to implementation issues. [ ][ ]
    • Support both variants of odt2txt, including the one provided by the unoconv package. [ ]
  • Bug fixes:
    • Do not return with a UNIX exit code of 0 if we encounter with a file whose human-readable metadata matches literal file contents. [ ]
    • Don t fail if comparing a nonexistent file with a .pyc file (and add test). [ ][ ]
    • If the debian.deb822 module raises any exception on import, re-raise it as an ImportError. This should fix diffoscope on some Fedora systems. [ ]
    • Even if a Sphinx .inv inventory file is labelled The remainder of this file is compressed using zlib, it might not actually be. In this case, don t traceback and simply return the original content. [ ]
  • Documentation:
    • Improve documentation for the new --timeout option due to a few misconceptions. [ ]
    • Drop reference in the manual page claiming the ability to compare non-existent files on the command-line. (This has not been possible since version 32 which was released in September 2015). [ ]
    • Update X has been modified after NT_GNU_BUILD_ID has been applied messages to, for example, not duplicating the full filename in the diffoscope output. [ ]
  • Codebase improvements:
    • Tidy some control flow. [ ]
    • Correct a recompile typo. [ ]
In addition, Alyssa Ross fixed the comparison of CBFS names that contain spaces [ ], Sergei Trofimovich fixed whitespace for compatibility with version 21.12 of the Black source code reformatter [ ] and Zbigniew J drzejewski-Szmek fixed JSON detection with a new version of file [ ].

Testing framework The Reproducible Builds project runs a significant testing framework at tests.reproducible-builds.org, to check packages and other artifacts for reproducibility. This month, the following changes were made:
  • Fr d ric Pierret (fepitre):
    • Add Debian bookworm to package set creation. [ ]
  • Holger Levsen:
    • Install the po4a package where appropriate, as it is needed for the Reproducible Builds website job [ ]. In addition, also run the i18n.sh and contributors.sh scripts [ ].
    • Correct some grammar in Debian live image build output. [ ]
    • Shell monitor improvements:
      • Only show the offline node section if there are offline nodes. [ ]
      • Colorise offline nodes. [ ]
      • Shrink screen usage. [ ][ ][ ]
    • Node health check improvements:
      • Detect if live package builds encounter incomplete snapshots. [ ][ ][ ]
      • Detect if a host is running with today s date (when it should be set artificially in the future). [ ]
    • Use the devscripts package from bullseye-backports on Debian nodes. [ ]
    • Use the Munin monitoring package bullseye-backports on Debian nodes too. [ ]
    • Update New Year handling, needed to be able to detect real and fake dates. [ ][ ]
    • Improve the error message of the script that powercycles the arm64 architecture nodes hosted by Codethink. [ ]
  • Mattia Rizzolo:
    • Use the new --timeout option added in diffoscope version 202. [ ]
  • Roland Clobus:
    • Update the build scripts now that the hooks for live builds are now maintained upstream in the live-build repository. [ ]
    • Show info lines in Jenkins when reproducible hooks have been active. [ ]
    • Use unique folders for the artifacts from each live Debian version. [ ]
  • Vagrant Cascadian:
    • Switch the Debian armhf architecture nodes to use new proxy. [ ]
    • Misc. node maintenance. [ ].

Upstream patches The Reproducible Builds project attempts to fix as many currently-unreproducible packages as possible. In January, we wrote a large number of such patches, including:

And finally If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

5 January 2022

Reproducible Builds: Reproducible Builds in December 2021

Welcome to the December 2021 report from the Reproducible Builds project! In these reports, we try and summarise what we have been up to over the past month, as well as what else has been occurring in the world of software supply-chain security. As a quick recap of what reproducible builds is trying to address, whilst anyone may inspect the source code of free software for malicious flaws, almost all software is distributed to end users as pre-compiled binaries. The motivation behind the reproducible builds effort is to ensure no flaws have been introduced during this compilation process by promising identical results are always generated from a given source, thus allowing multiple third-parties to come to a consensus on whether a build was compromised. As always, if you would like to contribute to the project, please get in touch with us directly or visit the Contribute page on our website.
Early in December, Julien Voisin blogged about setting up a rebuilderd instance in order to reproduce Tails images. Working on previous work from 2018, Julien has now set up a public-facing instance which is providing build attestations. As Julien dryly notes in his post, Currently, this isn t really super-useful to anyone, except maybe some Tails developers who want to check that the release manager didn t backdoor the released image. Naturally, we would contend sincerely that this is indeed useful.
The secure/anonymous Tor browser now supports reproducible source releases. According to the project s changelog, version 0.4.7.3-alpha of Tor can now build reproducible tarballs via the make dist-reprod command. This issue was tracked via Tor issue #26299.
Fabian Keil posted a question to our mailing list this month asking how they might analyse differences in images produced with the FreeBSD and ElectroBSD s mkimg and makefs commands:
After rebasing ElectroBSD from FreeBSD stable/11 to stable/12
I recently noticed that the "memstick" images are unfortunately
still not 100% reproducible.
Fabian s original post generated a short back-and-forth with Chris Lamb regarding how diffoscope might be able to support the particular format of images generated by this command set.

diffoscope diffoscope is our in-depth and content-aware diff utility. Not only can it locate and diagnose reproducibility issues, it can provide human-readable diffs from many kinds of binary formats. This month, Chris Lamb prepared and uploading versions 195, 196, 197 and 198 to Debian, as well as made the following changes:
  • Support showing Ordering differences only within .dsc field values. [ ]
  • Add support for XMLb files. [ ]
  • Also add, for example, /usr/lib/x86_64-linux-gnu to our local binary search path. [ ]
  • Support OCaml versions 4.11, 4.12 and 4.13. [ ]
  • Drop some unnecessary has_same_content_as logging calls. [ ]
  • Replace token variable with an anonymously-named variable instead to remove extra lines. [ ]
  • Don t use the runtime platform s native endianness when unpacking .pyc files. This fixes test failures on big-endian machines. [ ]
Mattia Rizzolo also made a number of changes to diffoscope this month as well, such as:
  • Also recognize GnuCash files as XML. [ ]
  • Support the pgpdump PGP packet visualiser version 0.34. [ ]
  • Ignore the new Lintian tag binary-with-bad-dynamic-table. [ ]
  • Fix the Enhances field in debian/control. [ ]
Finally, Brent Spillner fixed the version detection for Black uncompromising code formatter [ ], Jelle van der Waa added an external tool reference for Arch Linux [ ] and Roland Clobus added support for reporting when the GNU_BUILD_ID field has been modified [ ]. Thank you for your contributions!

Distribution work In Debian this month, 70 reviews of packages were added, 27 were updated and 41 were removed, adding to our database of knowledge about specific issues. A number of issue types were created as well, including: strip-nondeterminism version 1.13.0-1 was uploaded to Debian unstable by Holger Levsen. It included contributions already covered in previous months as well as new ones from Mattia Rizzolo, particularly that the dh_strip_nondeterminism Debian integration interface uses the new get_non_binnmu_date_epoch() utility when available: this is important to ensure that strip-nondeterminism does not break some kinds of binNMUs.
In the world of openSUSE, however, Bernhard M. Wiedemann posted his monthly reproducible builds status report.
In NixOS, work towards the longer-term goal of making the graphical installation image reproducible is ongoing. For example, Artturin made the gnome-desktop package reproducible.

Upstream patches The Reproducible Builds project attempts to fix as many currently-unreproducible packages as possible. In December, we wrote a large number of such patches, including:

Testing framework The Reproducible Builds project runs a significant testing framework at tests.reproducible-builds.org, to check packages and other artifacts for reproducibility. This month, the following changes were made:
  • Holger Levsen:
    • Run the Debian scheduler less often. [ ]
    • Fix the name of the Debian testing suite name. [ ]
    • Detect builds that are rescheduling due to problems with the diffoscope container. [ ]
    • No longer special-case particular machines having a different /boot partition size. [ ]
    • Automatically fix failed apt-daily and apt-daily-upgrade services [ ], failed e2scrub_all.service & user@ systemd units [ ][ ] as well as generic build failures [ ].
    • Simplify a script to powercycle arm64 architecture nodes hosted at/by codethink.co.uk. [ ]
    • Detect if the udd-mirror.debian.net service is down. [ ]
    • Various miscellaneous node maintenance. [ ][ ]
  • Roland Clobus (Debian live image generation):
    • If the latest snapshot is not complete yet, try to use the previous snapshot instead. [ ]
    • Minor: whitespace correction + comment correction. [ ]
    • Use unique folders and reports for each Debian version. [ ]
    • Turn off debugging. [ ]
    • Add a better error description for incorrect/missing arguments. [ ]
    • Report non-reproducible issues in Debian sid images. [ ]
Lastly, Mattia Rizzolo updated the automatic logfile parsing rules in a number of ways (eg. to ignore a warning about the Python setuptools deprecation) [ ][ ] and Vagrant Cascadian adjusted the config for the Squid caching proxy on a node. [ ]

If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

5 December 2021

Reproducible Builds: Reproducible Builds in November 2021

Welcome to the November 2021 report from the Reproducible Builds project. As a quick recap, whilst anyone may inspect the source code of free software for malicious flaws, almost all software is distributed to end users as pre-compiled binaries. The motivation behind the reproducible builds effort is therefore to ensure no flaws have been introduced during this compilation process by promising identical results are always generated from a given source, thus allowing multiple third-parties to come to a consensus on whether a build was compromised. If you are interested in contributing to our project, please visit our Contribute page on our website.
On November 6th, Vagrant Cascadian presented at this year s edition of the SeaGL conference, giving a talk titled Debugging Reproducible Builds One Day at a Time:
I ll explore how I go about identifying issues to work on, learn more about the specific issues, recreate the problem locally, isolate the potential causes, dissect the problem into identifiable parts, and adapt the packaging and/or source code to fix the issues.
A video recording of the talk is available on archive.org.
Fedora Magazine published a post written by Zbigniew J drzejewski-Szmek about how to Use Diffoscope in packager workflows, specifically around ensuring that new versions of a package do not introduce breaking changes:
In the role of a packager, updating packages is a recurring task. For some projects, a packager is involved in upstream maintenance, or well written release notes make it easy to figure out what changed between the releases. This isn t always the case, for instance with some small project maintained by one or two people somewhere on GitHub, and it can be useful to verify what exactly changed. Diffoscope can help determine the changes between package releases. [ ]

kpcyrd announced the release of rebuilderd version 0.16.3 on our mailing list this month, adding support for builds to generate multiple artifacts at once.
Lastly, we held another IRC meeting on November 30th. As mentioned in previous reports, due to the global events throughout 2020 etc. there will be no in-person summit event this year.

diffoscope diffoscope is our in-depth and content-aware diff utility. Not only can it locate and diagnose reproducibility issues, it can provide human-readable diffs from many kinds of binary formats. This month, Chris Lamb made the following changes, including preparing and uploading versions 190, 191, 192, 193 and 194 to Debian:
  • New features:
    • Continue loading a .changes file even if the referenced files do not exist, but include a comment in the returned diff. [ ]
    • Log the reason if we cannot load a Debian .changes file. [ ]
  • Bug fixes:
    • Detect XML files as XML files if file(1) claims if they are XML files or if they are named .xml. (#999438)
    • Don t duplicate file lists at each directory level. (#989192)
    • Don t raise a traceback when comparing nested directories with non-directories. [ ]
    • Re-enable test_android_manifest. [ ]
    • Don t reject Debian .changes files if they contain non-printable characters. [ ]
  • Codebase improvements:
    • Avoid aliasing variables if we aren t going to use them. [ ]
    • Use isinstance over type. [ ]
    • Drop a number of unused imports. [ ]
    • Update a bunch of %-style string interpolations into f-strings or str.format. [ ]
    • When pretty-printing JSON, mark the difference as being reformatted, additionally avoiding including the full path. [ ]
    • Import itertools top-level module directly. [ ]
Chris Lamb also made an update to the command-line client to trydiffoscope, a web-based version of the diffoscope in-depth and content-aware diff utility, specifically only waiting for 2 minutes for try.diffoscope.org to respond in tests. (#998360) In addition Brandon Maier corrected an issue where parts of large diffs were missing from the output [ ], Zbigniew J drzejewski-Szmek fixed some logic in the assert_diff_startswith method [ ] and Mattia Rizzolo updated the packaging metadata to denote that we support both Python 3.9 and 3.10 [ ] as well as a number of warning-related changes[ ][ ]. Vagrant Cascadian also updated the diffoscope package in GNU Guix [ ][ ].

Distribution work In Debian, Roland Clobus updated the wiki page documenting Debian reproducible Live images to mention some new bug reports and also posted an in-depth status update to our mailing list. In addition, 90 reviews of Debian packages were added, 18 were updated and 23 were removed this month adding to our knowledge about identified issues. Chris Lamb identified a new toolchain issue, absolute_path_in_cmake_file_generated_by_meson.
Work has begun on classifying reproducibility issues in packages within the Arch Linux distribution. Similar to the analogous effort within Debian (outlined above), package information is listed in a human-readable packages.yml YAML file and a sibling README.md file shows how to classify packages too. Finally, Bernhard M. Wiedemann posted his monthly reproducible builds status report for openSUSE and Vagrant Cascadian updated a link on our website to link to the GNU Guix reproducibility testing overview [ ].

Software development The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including: Elsewhere, in software development, Jonas Witschel updated strip-nondeterminism, our tool to remove specific non-deterministic results from a completed build so that it did not fail on JAR archives containing invalid members with a .jar extension [ ]. This change was later uploaded to Debian by Chris Lamb. reprotest is the Reproducible Build s project end-user tool to build the same source code twice in widely different environments and checking whether the binaries produced by the builds have any differences. This month, Mattia Rizzolo overhauled the Debian packaging [ ][ ][ ] and fixed a bug surrounding suffixes in the Debian package version [ ], whilst Stefano Rivera fixed an issue where the package tests were broken after the removal of diffoscope from the package s strict dependencies [ ].

Testing framework The Reproducible Builds project runs a testing framework at tests.reproducible-builds.org, to check packages and other artifacts for reproducibility. This month, the following changes were made:
  • Holger Levsen:
    • Document the progress in setting up snapshot.reproducible-builds.org. [ ]
    • Add the packages required for debian-snapshot. [ ]
    • Make the dstat package available on all Debian based systems. [ ]
    • Mark virt32b-armhf and virt64b-armhf as down. [ ]
  • Jochen Sprickerhof:
    • Add SSH authentication key and enable access to the osuosl168-amd64 node. [ ][ ]
  • Mattia Rizzolo:
    • Revert reproducible Debian: mark virt(32 64)b-armhf as down - restored. [ ]
  • Roland Clobus (Debian live image generation):
    • Rename sid internally to unstable until an issue in the snapshot system is resolved. [ ]
    • Extend testing to include Debian bookworm too.. [ ]
    • Automatically create the Jenkins view to display jobs related to building the Live images. [ ]
  • Vagrant Cascadian:
    • Add a Debian package set group for the packages and tools maintained by the Reproducible Builds maintainers themselves. [ ]


If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

3 December 2021

Paul Tagliamonte: Transmitting BPSK symbols (Part 2/5)

This post is part of a series called "PACKRAT". If this is the first post you've found, it'd be worth reading the intro post first and then looking over all posts in the series.
In the last post, we worked through what IQ is, and different formats that it may be sent or received in. Let s take that and move on to Transmitting BPSK using IQ data! When we transmit and receive information through RF using an SDR, data is traditionally encoded into a stream of symbols which are then used by a program to modulate the IQ stream, and sent over the airwaves. PACKRAT uses BPSK to encode Symbols through RF. BPSK is the act of modulating the phase of a sine wave to carry information. The transmitted wave swaps between two states in order to convey a 0 or a 1. Our symbols modulate the transmitted sine wave s phase, so that it moves between in-phase with the SDR s transmitter and 180 degrees (or radians) out of phase with the SDR s transmitter. The difference between a Bit and a Symbol in PACKRAT is not incredibly meaningful, and I ll often find myself slipping up when talking about them. I ve done my best to try and use the right word at the right stage, but it s not as obvious where the line between bit and symbol is at least not as obvious as it would be with QPSK or QAM. The biggest difference is that there are three meaningful states for PACKRAT over BPSK - a 1 (for In phase ), -1 (for 180 degrees out of phase ) and 0 (for no carrier ). For my implementation, a stream of all zeros will not transmit data over the airwaves, a stream of all 1s will transmit all 1 bits over the airwaves, and a stream of all -1s will transmit all 0 bits over the airwaves. We re not going to cover turning a byte (or bit) into a symbol yet I m going to write more about that in a later section. So for now, let s just worry about symbols in, and symbols out.

Transmitting a Sine wave at 0Hz If we go back to thinking about IQ data as a precisely timed measurements of energy over time at some particular specific frequency, we can consider what a sine wave will look like in IQ. Before we dive into antennas and RF, let s go to something a bit more visual. For the first example, you can see an example of a camera who s frame rate (or Sampling Rate!) matches the exact number of rotations per second (or Frequency!) of the propeller and it appears to stand exactly still. Every time the Camera takes a frame, it s catching the propeller in the exact same place in space, even though it s made a complete rotation. The second example is very similar, it s a light strobing (in this case, our sampling rate, since the darkness is ignored by our brains) at the same rate (frequency) as water dropping from a faucet and the video creator is even nice enough to change the sampling frequency to have the droplets move both forward and backward (positive and negative frequency) in comparison to the faucet. IQ works the same way. If we catch something in perfect frequency alignment with our radio, we ll wind up with readings that are the same for the entire stream of data. This means we can transmit a sine wave by setting all of the IQ samples in our buffer to 1+0i, which will transmit a pure sine wave at exactly the center frequency of the radio.
 var sine []complex 
for i := range sine  
sine[i] = complex(1.0, 0.0)
 
Alternatively, we can transmit a Sine wave (but with the opposite phase) by flipping the real value from 1 to -1. The same Sine wave is transmitted on the same Frequency, except when the wave goes high in the example above, the wave will go low in the example below.
 var sine []complex 
for i := range sine  
sine[i] = complex(-1.0, 0.0)
 
In fact, we can make a carrier wave at any phase angle and amplitude by using a bit of trig.
 // angle is in radians - here we have
 // 1.5 Pi (3 Tau) or 270 degrees.
 var angle = pi * 1.5
// amplitude controls the transmitted
 // strength of the carrier wave.
 var amplitude = 1.0
// output buffer as above
 var sine []complex 
for i := range sine  
sine[i] = complex(
amplitude*cos(angle),
amplitude*sin(angle),
)
 
The amplitude of the transmitted wave is the absolute value of the IQ sample (sometimes called magnitude), and the phase can be computed as the angle (or argument). The amplitude remains constant (at 1) in both cases. Remember back to the airplane propeller or water droplets we re controlling where we re observing the sine wave. It looks like a consistent value to us, but in reality it s being transmitted as a pure carrier wave at the provided frequency. Changing the angle of the number we re transmitting will control where in the sine wave cycle we re observing it at.

Generating BPSK modulated IQ data Modulating our carrier wave with our symbols is fairly straightforward to do we can multiply the symbol by 1 to get the real value to be used in the IQ stream. Or, more simply - we can just use the symbol directly in the constructed IQ data.
 var sampleRate = 2,621,440
var baudRate = 1024
// This represents the number of IQ samples
 // required to send a single symbol at the
 // provided baud and sample rate. I picked
 // two numbers in order to avoid half samples.
 // We will transmit each symbol in blocks of
 // this size.
 var samplesPerSymbol = sampleRate / baudRate
var samples = make([]complex, samplesPerSymbol)
// symbol is one of 1, -1 or 0.
 for each symbol in symbols  
for i := range samples  
samples[i] = complex(symbol, 0)
 
// write the samples out to an output file
 // or radio.
 write(samples)
 
If you want to check against a baseline capture, here s 10 example packets at 204800 samples per second.

Next Steps Now that we can transmit data, we ll start working on a receive path in Part 3, in order to check our work when transmitting the packets, as well as being able to hear packets we transmit from afar, coming up next in Part 3!!

2 December 2021

Paul Tagliamonte: Processing IQ data formats (Part 1/5)

This post is part of a series called "PACKRAT". If this is the first post you've found, it'd be worth reading the intro post first and then looking over all posts in the series.
When working with SDRs, information about the signals your radio is receiving are communicated by streams of IQ data. IQ is short for In-phase and Quadrature , which means 90 degrees out of phase. Values in the IQ stream are complex numbers, so converting them to a native complex type in your language helps greatly when processing the IQ data for meaning. I won t get too deep into what IQ is or why complex numbers (mostly since I don t think I fully understand it well enough to explain it yet), but here s some basics in case this is your first interaction with IQ data before going off and reading more.
Before we get started at any point, if you feel lost in this post, it's OK to take a break to do a bit of learning elsewhere in the internet. I'm still new to this, so I'm sure my overview in one paragraph here won't help clarify things too much. This took me months to sort out on my own. It's not you, really! I particularly enjoyed reading visual-dsp.switchb.org when it came to learning about how IQ represents signals, and Software-Defined Radio for Engineers for a more general reference.
Each value in the stream is taken at a precisely spaced sampling interval (called the sampling rate of the radio). Jitter in that sampling interval, or a drift in the requested and actual sampling rate (usually represented in PPM, or parts per million how many samples out of one million are missing) can cause errors in frequency. In the case of a PPM error, one radio may think it s 100.1MHz and the other may think it s 100.2MHz, and jitter will result in added noise in the resulting stream. A single IQ sample is both the real and imaginary values, together. The complex number (both parts) is the sample. The number of samples per second is the number of real and imaginary value pairs per second. Each sample is reading the electrical energy coming off the antenna at that exact time instant. We re looking to see how that goes up and down over time to determine what frequencies we re observing around us. If the IQ stream is only real-valued measures (e.g., float values rather than complex values reading voltage from a wire), you can still send and receive signals, but those signals will be mirrored across your 0Hz boundary. That means if you re tuned to 100MHz, and you have a nearby transmitter at 99.9MHz, you d see it at 100.1MHz. If you want to get an intuitive understanding of this concept before getting into the heavy math, a good place to start is looking at how Quadrature encoders work. Using complex numbers means we can see up in frequency as well as down in frequency, and understand that those are different signals. The reason why we need negative frequencies is that our 0Hz is the center of our SDR s tuned frequency, not actually at 0Hz in nature. Generally speaking, it s doing loads in hardware (and firmware!) to mix the raw RF signals with a local oscillator to a frequency that can be sampled at the requested rate (fundamentally the same concept as a superheterodyne receiver), so a frequency of -10MHz means that signal is 10 MHz below the center of our SDR s tuned frequency. The sampling rate dictates the amount of frequency representable in the data stream. You ll sometimes see this called the Nyquist frequency. The Nyquist Frequency is one half of the sampling rate. Intuitively, if you think about the amount of bandwidth observable as being 1:1 with the sampling rate of the stream, and the middle of your bandwidth is 0 Hz, you would only have enough space to go up in frequency for half of your bandwidth or half of your sampling rate. Same for going down in frequency.

Float 32 / Complex 64 IQ samples that are being processed by software are commonly processed as an interleaved pair of 32 bit floating point numbers, or a 64 bit complex number. The first float32 is the real value, and the second is the imaginary value.
I#0
Q#0
I#1
Q#1
I#2
Q#2
The complex number 1+1i is represented as 1.0 1.0 and the complex number -1-1i is represented as -1.0 -1.0. Unless otherwise specified, all the IQ samples and pseudocode to follow assumes interleaved float32 IQ data streams. Example interleaved float32 file (10Hz Wave at 1024 Samples per Second)

RTL-SDR IQ samples from the RTL-SDR are encoded as a stream of interleaved unsigned 8 bit integers (uint8 or u8). The first sample is the real (in-phase or I) value, and the second is the imaginary (quadrature or Q) value. Together each pair of values makes up a complex number at a specific time instant.
I#0
Q#0
I#1
Q#1
I#2
Q#2
The complex number 1+1i is represented as 0xFF 0xFF and the complex number -1-1i is represented as 0x00 0x00. The complex number 0+0i is not easily representable since half of 0xFF is 127.5.
Complex Number Representation
1+1i []uint8 0xFF, 0xFF
-1+1i []uint8 0x00, 0xFF
-1-1i []uint8 0x00, 0x00
0+0i []uint8 0x80, 0x80 or []uint8 0x7F, 0x7F
And finally, here s some pseudocode to convert an rtl-sdr style IQ sample to a floating point complex number:
...
in = []uint8 0x7F, 0x7F 
real = (float(iq[0])-127.5)/127.5
imag = (float(iq[1])-127.5)/127.5
out = complex(real, imag)
....
Example interleaved uint8 file (10Hz Wave at 1024 Samples per Second)

HackRF IQ samples from the HackRF are encoded as a stream of interleaved signed 8 bit integers (int8 or i8). The first sample is the real (in-phase or I) value, and the second is the imaginary (quadrature or Q) value. Together each pair of values makes up a complex number at a specific time instant.
I#0
Q#0
I#1
Q#1
I#2
Q#2
Formats that use signed integers do have one quirk due to two s complement, which is that the smallest negative number representable s absolute value is one more than the largest positive number. int8 values can range between -128 to 127, which means there s bit of ambiguity in how +1, 0 and -1 are represented. Either you can create perfectly symmetric ranges of values between +1 and -1, but 0 is not representable, have more possible values in the negative range, or allow values above (or just below) the maximum in the range to be allowed. Within my implementation, my approach has been to scale based on the max integer value of the type, so the lowest possible signed value is actually slightly smaller than -1. Generally, if your code is seeing values that low the difference in step between -1 and slightly less than -1 isn t very significant, even with only 8 bits. Just a curiosity to be aware of.
Complex Number Representation
1+1i []int8 127, 127
-1+1i []int8 -128, 127
-1-1i []int8 -128, -128
0+0i []int8 0, 0
And finally, here s some pseudocode to convert a hackrf style IQ sample to a floating point complex number:
...
in = []int8 -5, 112 
real = (float(in[0]))/127
imag = (float(in[1]))/127
out = complex(real, imag)
....
Example interleaved int8 file (10Hz Wave at 1024 Samples per Second)

PlutoSDR IQ samples from the PlutoSDR are encoded as a stream of interleaved signed 16 bit integers (int16 or i16). The first sample is the real (in-phase or I) value, and the second is the imaginary (quadrature or Q) value. Together each pair of values makes up a complex number at a specific time instant. Almost no SDRs capture at a 16 bit depth natively, often you ll see 12 bit integers (as is the case with the PlutoSDR) being sent around as 16 bit integers. This leads to the next possible question, which is are values LSB or MSB aligned? The PlutoSDR sends data LSB aligned (which is to say, the largest real or imaginary value in the stream will not exceed 4095), but expects data being transmitted to be MSB aligned (which is to say the lowest set bit possible is the 5th bit in the number, or values can only be set in increments of 16). As a result, the quirk observed with the HackRF (that the range of values between 0 and -1 is different than the range of values between 0 and +1) does not impact us so long as we do not use the whole 16 bit range.
Complex Number Representation
1+1i []int16 32767, 32767
-1+1i []int16 -32768, 32767
-1-1i []int16 -32768, -32768
0+0i []int16 0, 0
And finally, here s some pseudocode to convert a PlutoSDR style IQ sample to a floating point complex number, including moving the sample from LSB to MSB aligned:
...
in = []int16 -15072, 496 
// shift left 4 bits (16 bits - 12 bits = 4 bits)
 // to move from LSB aligned to MSB aligned.
 in[0] = in[0] << 4
in[1] = in[1] << 4
real = (float(in[0]))/32767
imag = (float(in[1]))/32767
out = complex(real, imag)
....
Example interleaved i16 file (10Hz Wave at 1024 Samples per Second)

Next Steps Now that we can read (and write!) IQ data, we can get started first on the transmitter, which we can (in turn) use to test receiving our own BPSK signal, coming next in Part 2!

21 November 2021

Antoine Beaupr : The last syncmaildir crash

My syncmaildir (SMD) setup failed me one too many times (previously, previously). In an attempt to migrate to an alternative mail synchronization tool, I looked into using my IMAP server again, and found out my mail spool was in a pretty bad shape. I'm comparing mbsync and offlineimap in the next post but this post talks about how I recovered the mail spool so that tools like those could correctly synchronise the mail spool again.

The latest crash On Monday, SMD just started failing with this error:
nov 15 16:12:19 angela systemd[2305]: Starting pull emails with syncmaildir...
nov 15 16:12:22 angela systemd[2305]: smd-pull.service: Succeeded.
nov 15 16:12:22 angela systemd[2305]: Finished pull emails with syncmaildir.
nov 15 16:14:08 angela systemd[2305]: Starting pull emails with syncmaildir...
nov 15 16:14:11 angela systemd[2305]: smd-pull.service: Main process exited, code=exited, status=1/FAILURE
nov 15 16:14:11 angela systemd[2305]: smd-pull.service: Failed with result 'exit-code'.
nov 15 16:14:11 angela systemd[2305]: Failed to start pull emails with syncmaildir.
nov 15 16:16:14 angela systemd[2305]: Starting pull emails with syncmaildir...
nov 15 16:16:17 angela smd-pull[27178]: smd-client: ERROR: Network error.
nov 15 16:16:17 angela smd-pull[27178]: smd-client: ERROR: Unable to get any data from the other endpoint.
nov 15 16:16:17 angela smd-pull[27178]: smd-client: ERROR: This problem may be transient, please retry.
nov 15 16:16:17 angela smd-pull[27178]: smd-client: ERROR: Hint: did you correctly setup the SERVERNAME variable
nov 15 16:16:17 angela smd-pull[27178]: smd-client: ERROR: on your client? Did you add an entry for it in your ssh
nov 15 16:16:17 angela smd-pull[27178]: smd-client: ERROR: configuration file?
nov 15 16:16:17 angela smd-pull[27178]: smd-client: ERROR: Network error
nov 15 16:16:17 angela smd-pull[27188]: register: smd-client@localhost: TAGS: error::context(handshake) probable-cause(network) human-intervention(avoidable) suggested-actions(retry)
nov 15 16:16:17 angela systemd[2305]: smd-pull.service: Main process exited, code=exited, status=1/FAILURE
nov 15 16:16:17 angela systemd[2305]: smd-pull.service: Failed with result 'exit-code'.
nov 15 16:16:17 angela systemd[2305]: Failed to start pull emails with syncmaildir.
What is frustrating is that there's actually no network error here. Running the command by hand I did see a different message, but now I have lost it in my backlog. It had something to do with a filename being too long, and I gave up debugging after a while. This happened suddenly too, which added to the confusion. In a fit of rage I started this blog post and experimenting with alternatives, which led me down a lot of rabbit holes. Reviewing my previous mail crash documentation, it seems most solutions involve talking to an IMAP server, so I figured I would just do that. Wanting to try something new, i gave isync (AKA mbsync) a try. Oh dear, I did not expect how much trouble just talking to my IMAP server would be, which wasn't not isync's fault, for what that's worth. It was the primary tool I used to debug things, and served me well in that regard.

Mailbox corruption The first thing I found out is that certain messages in the IMAP spool were corrupted. mbsync would stop on a FETCH command and Dovecot would give me those errors on the server side.

"wrong W value"
nov 16 15:31:27 marcos dovecot[3621800]: imap(anarcat)<3630489><wAmSzO3QZtfAqAB1>: Error: Mailbox junk: Maildir filename has wrong W value, renamed the file from /home/anarcat/Maildir/.junk/cur/1454623938.M101164P22216.marcos,S=2495,W=2578:2,S to /home/anarcat/Maildir/.junk/cur/1454623938.M101164P22216.marcos,S=2495:2,S
nov 16 15:31:27 marcos dovecot[3621800]: imap(anarcat)<3630489><wAmSzO3QZtfAqAB1>: Error: Mailbox junk: Deleting corrupted cache record uid=1582: UID 1582: Broken virtual size in mailbox junk: read(/home/anarcat/Maildir/.junk/cur/1454623938.M101164P22216.marcos,S=2495,W=2578:2,S): FETCH BODY[] got too little data: 2540 vs 2578
At least this first error was automatically healed by Dovecot (by renaming the file without the W= flag). The problem is that the FETCH command fails and mbsync exits noisily. So you need to constantly restart mbsync with a silly command like:
while ! mbsync -a; do sleep 1; done

"cached message size larger than expected"
nov 16 13:53:08 marcos dovecot[3520770]: imap(anarcat)<3594402><M5JHb+zQ3NLAqAB1>: Error: Mailbox Sent: UID=19288: read(/home/anarcat/Maildir/.Sent/cur/1224790447.M898726P9811V000000000000FE06I00794FB1_0.marvin,S=2588:2,S) failed: Cached message size larger than expected (2588 > 2482, box=Sent, UID=19288) (read reason=mail stream)
nov 16 13:53:08 marcos dovecot[3520770]: imap(anarcat)<3594402><M5JHb+zQ3NLAqAB1>: Error: Mailbox Sent: Deleting corrupted cache record uid=19288: UID 19288: Broken physical size in mailbox Sent: read(/home/anarcat/Maildir/.Sent/cur/1224790447.M898726P9811V000000000000FE06I00794FB1_0.marvin,S=2588:2,S) failed: Cached message size larger than expected (2588 > 2482, box=Sent, UID=19288)
nov 16 13:53:08 marcos dovecot[3520770]: imap(anarcat)<3594402><M5JHb+zQ3NLAqAB1>: Error: Mailbox Sent: UID=19288: read(/home/anarcat/Maildir/.Sent/cur/1224790447.M898726P9811V000000000000FE06I00794FB1_0.marvin,S=2588:2,S) failed: Cached message size larger than expected (2588 > 2482, box=Sent, UID=19288) (read reason=)
nov 16 13:53:08 marcos dovecot[3520770]: imap-login: Panic: epoll_ctl(del, 7) failed: Bad file descriptor
This second problem is much harder to fix, because dovecot does not recover automatically. This is Dovecot complaining that the cached size (the S= field, but also present in Dovecot's metadata files) doesn't match the file size. I wonder if at least some of those messages were corrupted in the OfflineIMAP to syncmaildir migration because part of that procedure is to run the strip_header script to remove content from the emails. That could easily have broken things since the files do not also get renamed.

Workaround So I read a lot of the Dovecot documentation on the maildir format, and wrote an extensive fix script for those two errors. The script worked and mbsync was able to sync the entire mail spool. And no, rebuilding the index files didn't work. Also tried doveadm force-resync -u anarcat which didn't do anything. In the end I also had to do this, because the wrong cache values were also stored elsewhere.
service dovecot stop ; find -name 'dovecot*' -delete; service dovecot start
This would have totally broken any existing clients, but thankfully I'm starting from scratch (except maybe webmail, but I'm hoping it will self-heal as well, assuming it only has a cache and not a full replica of the mail spool).

Incoherence between Maildir and IMAP Unfortunately, the first mbsync was incomplete as it was missing about 15,000 mails:
anarcat@angela:~(main)$ find Maildir -type f -type f -a \! -name '.*'   wc -l 
384836
anarcat@angela:~(main)$ find Maildir-mbsync/ -type f -a \! -name '.*'   wc -l 
369221
As it turns out, mbsync was not at fault here either: this was yet more mail spool corruption. It's actually 26 folders (out of 205) with inconsistent sizes, which can be found with:
for folder in * .[^.]* ; do 
  printf "%s\t%d\n" $folder $(find "$folder" -type f -a \! -name '.*'   wc -l );
done
The special \! -name '.*' bit is to ignore the mbsync metadata, which creates .uidvalidity and .mbsyncstate in every folder. That ignores about 200 files but since they are spread around all folders, which was making it impossible to review where the problem was. Here is what the diff looks like:
--- Maildir-list    2021-11-17 20:42:36.504246752 -0500
+++ Maildir-mbsync-list 2021-11-17 20:18:07.731806601 -0500
@@ -6,16 +6,15 @@
[...]
 .Archives  1
 .Archives.2010 3553
-.Archives.2011 3583
-.Archives.2012 12593
+.Archives.2011 3582
+.Archives.2012 620
 .Archives.2013 8576
 .Archives.2014 11057
-.Archives.2015 8173
+.Archives.2015 8165
 .Archives.2016 54
 .band  34
 .bitbuck   1
@@ -38,13 +37,12 @@
 .couchsurfers  2
-cur    11285
+cur    11280
 .current   130
 .cv    2
 .debbug    262
-.debian    37544
-drafts 1
-.Drafts    4
+.debian    37533
+.Drafts    2
 .drone 241
 .drupal    188
 .drupal-devel  303
[...]

Misfiled messages It's a bit all over the place, but we can already notice some huge differences between mailboxes, for example in the Archives folders. As it turns out, at least 12,000 of those missing mails were actually misfiled: instead of being in the Maildir/.Archives.2012/cur/ folder, they were directly in Maildir/.Archives.2012/. This is something that doesn't matter for SMD (and possibly for notmuch? it does matter, notmuch suddenly found 12,000 new mails) but that definitely matters to Dovecot and therefore mbsync... After moving those files around, we still have 4,000 message missing:
anarcat@angela:~(main)$ find Maildir-mbsync/  -type f -a \! -name '.*'   wc -l 
381196
anarcat@angela:~(main)$ find Maildir/  -type f -a \! -name '.*'   wc -l 
385053
The problem is that those 4,000 missing mails are harder to track. Take, for example, .Archives.2011, which has a single message missing, out of 3,582. And the files are not identical: the checksums don't match after going through the IMAP transport, so we can't use a tool like hashdeep to compare the trees and find why any single file is missing.

"register" folder One big chunk of the 4,000, however, is a special folder called register in my spool, which I am syncing separately (see Securing registration email for details on that setup). That actually covers 3,700 of those messages, so I actually have a more modest 300 messages to figure out, after (easily!) configuring mbsync to sync that folder separately:
 @@ -30,9 +33,29 @@ Slave :anarcat-local:
  # Exclude everything under the internal [Gmail] folder, except the interesting folders
  #Patterns * ![Gmail]* "[Gmail]/Sent Mail" "[Gmail]/Starred" "[Gmail]/All Mail"
  # Or include everything
 -Patterns *
 +#Patterns *
 +Patterns * !register  !.register
  # Automatically create missing mailboxes, both locally and on the server
  #Create Both
  Create slave
  # Sync the movement of messages between folders and deletions, add after making sure the sync works
  #Expunge Both
 +
 +IMAPAccount anarcat-register
 +Host imap.anarc.at
 +User register
 +PassCmd "pass imap.anarc.at-register"
 +SSLType IMAPS
 +CertificateFile /etc/ssl/certs/ca-certificates.crt
 +
 +IMAPStore anarcat-register-remote
 +Account anarcat-register
 +
 +MaildirStore anarcat-register-local
 +SubFolders Maildir++
 +Inbox ~/Maildir-mbsync/.register/
 +
 +Channel anarcat-register
 +Master :anarcat-register-remote:
 +Slave :anarcat-register-local:
 +Create slave

"tmp" folders and empty messages After syncing the "register" messages, I end up with the measly little 160 emails out of sync:
anarcat@angela:~(main)$ find Maildir-mbsync/  -type f -a \! -name '.*'   wc -l 
384900
anarcat@angela:~(main)$ find Maildir/  -type f -a \! -name '.*'   wc -l 
385059
Argh. After more digging, I have found 131 mails in the tmp/ directories of the client's mail spool. Mysterious! On the server side, it's even more files, and not the same ones. Possible that those were mails that were left there during a failed delivery of some sort, during a power failure or some sort of crash? Who knows. It could be another race condition in SMD if it runs while mail is being delivered in tmp/... The first thing to do with those is to cleanup a bunch of empty files (21 on angela):
find .[^.]*/tmp -type f -empty -delete
As it turns out, they are all duplicates, in the sense that notmuch can easily find a copy of files with the same message ID in its database. In other words, this hairy command returns nothing
find .[^.]*/tmp -type f   while read path; do
  msgid=$(grep -m 1  -i ^message-id "$path"   sed 's/Message-ID: //i;s/[<>]//g');
  if notmuch count --exclude=false  "id:$msgid"   grep -q 0; then
    echo "$path <$msgid> not in notmuch" ;
  fi;
done
... which is good. Or, to put it another way, this is safe:
find .[^.]*/tmp -type f -delete
Poof! 314 mails cleaned on the server side. Interestingly, SMD doesn't pick up on those changes at all and still sees files in tmp/ directories on the client side, so we need to operate the same twisted logic there.

notmuch to the rescue again After cleaning that on the client, we get:
anarcat@angela:~(main)$ find Maildir/  -type f -a \! -name '.*'   wc -l 
384928
anarcat@angela:~(main)$ find Maildir-mbsync/  -type f -a \! -name '.*'   wc -l 
384901
Ha! 27 mails difference. Those are the really sticky, unclear ones. I was hoping a full sync might clear that up, but after deleting the entire directory and starting from scratch, I end up with:
anarcat@angela:~(main)$ find Maildir -type f -type f -a \! -name '.*'   wc -l 
385034
anarcat@angela:~(main)$ find Maildir-mbsync -type f -type f -a \! -name '.*'   wc -l 
384993
That is: even more messages missing (now 37). Sigh. Thankfully, this is something notmuch can help with: it can index all files by Message-ID (which I learned is case-insensitive, yay) and tell us which messages don't make it through. Considering the corruption I found in the mail spool, I wouldn't be the least surprised those messages are just skipped by the IMAP server. Unfortunately, there's nothing on the Dovecot server logs that would explain the discrepancy. Here again, notmuch comes to the rescue. We can list all message IDs to figure out that discrepancy:
notmuch search --exclude=false --output=messages '*'   pv -s 18M   sort > Maildir-msgids
notmuch --config=.notmuch-config-mbsync search --exclude=false --output=messages '*'   pv -s 18M   sort > Maildir-mbsync-msgids
And then we can see how many messages notmuch thinks are missing:
$ wc -l *msgids
372723 Maildir-mbsync-msgids
372752 Maildir-msgids
That's 29 messages. Oddly, it doesn't exactly match the find output:
anarcat@angela:~(main)$ find Maildir-mbsync -type f -type f -a \! -name '.*'   wc -l 
385204
anarcat@angela:~(main)$ find Maildir -type f -type f -a \! -name '.*'   wc -l 
385241
That is 10 more messages. Ugh. But actually, I know what those are: more misfiled messages (in a .folder/draft/ directory, bizarrely, so the totals actually match. In the notmuch output, there's a lot of stuff like this:
id:notmuch-sha1-fb880d673e24f5dae71b6b4d825d4a0d5d01cde4
Those are messages without a valid Message-ID. Notmuch (presumably) constructs one based on the file's checksum. Because the files differ between the IMAP server and the local mail spool (which is unfortunate, but possibly inevitable), those do not match. There are exactly the same number of those on both sides, so I'll go ahead and assume those are all accounted for. What remains is:
anarcat@angela:~(main)$ diff -u Maildir-mbsync-msgids Maildir-msgids    grep '^\-[^-]'   grep -v sha1   wc -l 
2
anarcat@angela:~(main)$ diff -u Maildir-mbsync-msgids Maildir-msgids    grep '^\+[^+]'   grep -v sha1   wc -l 
21
anarcat@angela:~(main)$ 
ie. 21 missing from mbsync, and, surprisingly, 2 missing from the original mail spool. Further inspection also showed they were all messages with some sort of "corruption": no body and only headers. I am not sure that is a legal email format in the first place. Since they were mostly spam or administrative emails ("You have been unsubscribed from mailing list..."), it seems fairly harmless to ignore those.

Conclusion As we'll see in the next article, SMD has stellar performance. But that comes at a huge cost: it accesses the mail storage directly. This can (and has) created significant problems on the mail server. It's unclear exactly why those things happen, but Dovecot expects a particular storage format on its file, and it seems unwise to bypass that. In the future, I'll try to remember to avoid that, especially since mechanisms like SMD require special server access (SSH) which, in the long term, I am not sure I want to maintain or expect. In other words, just talking with an IMAP server opens up a lot more possibilities of hosting than setting up a custom synchronisation protocol over SSH. It's also safer and more reliable, as we have seen. Thankfully, I've been able to recover from all the errors I could find, but it could have gone differently and it would have been possible for SMD to permanently corrupt significant part of my mail archives. In the end, however, the last drop was just another weird bug which, ironically, SMD mysteriously recovered from on its own while I was writing this documentation and migrating away from it. In any case, I recommend SMD users start looking for alternatives. The project has been archived upstream, and the Debian package has been orphaned. I have seen significant mail box corruption, including entire mail spool destruction, mostly due to incorrect locking code. I have filed a release-critical bug in Debian to make sure it doesn't ship with Debian bookworm. Alternatives like mbsync provide fast and reliable transport, including over SSH. See the next article for further discussion of the alternatives.

6 November 2021

Reproducible Builds: Reproducible Builds in October 2021

Welcome to the October 2021 report from the Reproducible Builds project!
This month Samanta Navarro posted to the oss-security security mailing on a novel category of exploit in the .tar archive format, where a single .tar file contains different contents depending on the tar utility being used. Naturally, this has consequences for reproducible builds as Samanta goes onto reply:

Arch Linux uses libarchive (bsdtar) in its build environment. The default tar program installed is GNU tar. It is possible to create a source distribution which leads to different files seen by the build environment than compared to a careful reviewer and other Linux distributions.
Samanta notes that addressing the tar utilities themselves will not be a sufficient fix:
I have submitted bug reports and patches to some projects but eventually I had to conclude that the problem itself cannot be fixed by these implementations alone. The best choice for these tools would be to only allow archives which are fully compatible to standards but this in turn would render a lot of archives broken.
Reproducible builds, with its twin ideas of reaching consensus on the build outputs as well as precisely recording and describing the build environment, would help address this problem at a higher level.
Codethink announced that they had achieved ISO-26262 ASIL D Tool Certification, a way of determining specific safety standards for software. Codethink used open source tooling to achieve this, but they also leverage:
Reproducibility, repeatability and traceability of builds, drawing heavily on best-practices championed by the Reproducible Builds project.

Elsewhere on the internet, according to a comment on Hacker News, Microsoft are now comparing NPM Javascript packages with their original source repositories:
I got a PR in my repository a few days ago leading back to a team trying to make it easier for packages to be reproducible from source.

Lastly, Martin Monperrus started an interesting thread on our mailing list about Github, specifically that their autogenerated release tarballs are not deterministic . The thread generated a significant number of replies that are worth reading.

Events and presentations

Community news On our mailing list this month:
There were quite a few changes to the Reproducible Builds website and documentation this month as well, including Feng Chai updating some links on our publications page [ ] and marco updated our project metadata around the Bitcoin Core building guide [ ].
Lastly, we ran another productive meeting on IRC during October. A full set of notes from the meeting is available to view.

Distribution work Qubes was heavily featured in the latest edition of Linux Weekly News, and a significant section was dedicated to discussing reproducibility. For example, it was mentioned that the Qubes project has been working on incorporating reproducible builds into its continuous integration (CI) infrastructure . But the LWN article goes on to describe that:
The current goal is to be able to build the Qubes OS Debian templates solely from packages that can be built reproducibly. Templates in Qubes OS are VM images that can be used to start an application qube quickly based on the template. The qube will have read-only access to the root filesystem of the template, so that the same root filesystem can be shared with multiple application qubes. There are official templates for several variants of both Fedora and Debian, as well as community maintained templates for several other distributions.
You can view the whole article on LWN, and Fr d ric also published a lengthy summary about their work on reproducible builds in Qubes as well for those wishing to learn more.
In Debian this month, 133 reviews of Debian packages were added, 81 were updated and 24 were removed this month, adding to Debian s ever-growing knowledge about identified issues. A number of issues were categorised and added by Chris Lamb and Vagrant Cascadian too [ ][ ][ ]. In addition, work on alternative snapshot service has made progress by Fr d ric Pierret and Holger Levsen this month, including moving from the existing host (snapshot.notset.fr) to snapshot.reproducible-builds.org (more info) thanks to OSUOSL for the machine and hosting and Debian for the disks.
Finally, Bernhard M. Wiedemann posted his monthly reproducible builds status report.

diffoscope diffoscope is our in-depth and content-aware diff utility. Not only can it locate and diagnose reproducibility issues, it can provide human-readable diffs from many kinds of binary formats. This month, Chris Lamb made the following changes, including preparing and uploading versions 186, 187, 188 and 189 to Debian
  • New features:
    • Add support for Python Sphinx inventory files (usually named objects.inv on-disk). [ ]
    • Add support for comparing .pyc files. Thanks to Sergei Trofimovich for the inspiration. [ ]
    • Try some alternative suffixes (e.g. .py) to support distributions that strip or retain them. [ ][ ]
  • Bug fixes:
    • Fix Python decompilation tests under Python 3.10+ [ ] and for Python 3.7 [ ].
    • Don t raise a traceback if we cannot unmarshal Python bytecode. This is in order to support Python 3.7 failing to load .pyc files generated with newer versions of Python. [ ]
    • Skip Python bytecode testing where we do not have an expected diff. [ ]
  • Codebase improvements:
    • Use our file_version_is_lt utility instead of accepting both versions of uImage expected diff. [ ]
    • Split out a custom call to assert_diff for a .startswith equivalent. [ ]
    • Use skipif instead of manual conditionals in some tests. [ ]
In addition, Jelle van der Waa added external tool references for Arch Linux for ocamlobjinfo, openssl and ffmpeg [ ][ ][ ] and added Arch Linux as a Continuous Integration (CI) test target. [ ] and Vagrant Cascadian updated the testsuite to skip Python bytecode comparisons when file(1) is older than 5.39. [ ] as well as added external tool references for the Guix distribution for dumppdf and ppudump. [ ][ ]. Vagrant Cascadian also updated the diffoscope package in GNU Guix [ ][ ]. Lastly, Guangyuan Yang updated the FreeBSD package name on the website [ ], Mattia Rizzolo made a change to override a new Lintian warning due to the new test files [ ], Roland Clobus added support to detect and log if the GNU_BUILD_ID field in an ELF binary been modified [ ], Sandro J ckel updated a number of helpful links on the website [ ] and Sergei Trofimovich made the uImage test output support file() version 5.41 [ ].

reprotest reprotest is the Reproducible Build s project end-user tool to build same source code twice in widely differing environments, checking the binaries produced by the builds for any differences. This month, reprotest version 0.7.18 was uploaded to Debian unstable by Holger Levsen, which also included a change by Holger to clarify that Python 3.9 is used nowadays [ ], but it also included two changes by Vasyl Gello to implement realistic CPU architecture shuffling [ ] and to log the selected variations when the verbosity is configured at a sufficiently high level [ ]. Finally, Vagrant Cascadian updated reprotest to version 0.7.18 in GNU Guix.

Upstream patches The Reproducible Builds project detects, dissects and attempts to fix unreproducible packages. We try to send all of our patches upstream where appropriate. We authored a large number of such patches this month, including:

Testing framework The Reproducible Builds project runs a testing framework at tests.reproducible-builds.org, to check packages and other artifacts for reproducibility. This month, the following changes were made:
  • Holger Levsen:
    • Debian-related changes:
      • Incorporate a fix from bremner into builtin-pho related to binary-NMUs. [ ]
      • Keep bullseye environments around longer, in an attempt to fix a Jenkins issue. [ ]
      • Improve the documentation of buildinfos.debian.net. [ ]
      • Improve documentation for the builtin-pho setup. [ ][ ]
    • OpenWrt-related changes:
      • Also use -j1 for better debugging. [ ]
      • Document that that Python 3.x is now used. [ ]
      • Enable further debugging for the toolchain build. [ ]
    • New snapshot.reproducible-builds.org service:
      • Actually add new node. [ ][ ]
      • Install xfsprogs on snapshot.reproducible-builds.org. [ ]
      • Create account for fpierret on new node. [ ]
      • Run node_health_check job on new node too. [ ]
  • Mattia Rizzolo:
    • Debian-related changes:
      • Handle schroot errors when invoking diffoscope instead of masking them. [ ][ ]
      • Declare and define some variables separately to avoid masking the subshell return code. [ ]
      • Fix variable name. [ ]
      • Improve log reporting. [ ]
      • Execute apt-get update with the -q argument to get more decent logs. [ ]
      • Set the Debian HTTP mirror and proxy for snapshot.reproducible-builds.org. [ ]
      • Install the libarchive-tools package (instead of bsdtar) when updating Jenkins nodes. [ ]
    • Be stricter about errors when starting the node agent [ ] and don t overwrite NODE_NAME so that we can expect Jenkins to properly set for us [ ].
    • Explicitly warn if the NODE_NAME is not a fully-qualified domain name (FQDN). [ ]
    • Document whether a node runs in the future. [ ]
    • Disable postgresql_autodoc as it not available in bullseye. [ ]
    • Don t be so eager when deleting schroot internals, call to schroot -e to terminate the schroots instead. [ ]
    • Only consider schroot underlays for deletion that are over a month old. [ ][ ]
    • Only try to unmount /proc if it s actually mounted. [ ]
    • Move the db_backup task to its own Jenkins job. [ ]
Lastly, Vasyl Gello added usage information to the reproducible_build.sh script [ ].

Contributing If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

6 October 2021

Reproducible Builds: Reproducible Builds in September 2021

The goal behind reproducible builds is to ensure that no deliberate flaws have been introduced during compilation processes via promising or mandating that identical results are always generated from a given source. This allowing multiple third-parties to come to an agreement on whether a build was compromised or not by a system of distributed consensus. In these reports we outline the most important things that have been happening in the world of reproducible builds in the past month:
First mentioned in our March 2021 report, Martin Heinz published two blog posts on sigstore, a project that endeavours to offer software signing as a public good, [the] software-signing equivalent to Let s Encrypt . The two posts, the first entitled Sigstore: A Solution to Software Supply Chain Security outlines more about the project and justifies its existence:
Software signing is not a new problem, so there must be some solution already, right? Yes, but signing software and maintaining keys is very difficult especially for non-security folks and UX of existing tools such as PGP leave much to be desired. That s why we need something like sigstore - an easy to use software/toolset for signing software artifacts.
The second post (titled Signing Software The Easy Way with Sigstore and Cosign) goes into some technical details of getting started.
There was an interesting thread in the /r/Signal subreddit that started from the observation that Signal s apk doesn t match with the source code:
Some time ago I checked Signal s reproducibility and it failed. I asked others to test in case I did something wrong, but nobody made any reports. Since then I tried to test the Google Play Store version of the apk against one I compiled myself, and that doesn t match either.

BitcoinBinary.org was announced this month, which aims to be a repository of Reproducible Build Proofs for Bitcoin Projects :
Most users are not capable of building from source code themselves, but we can at least get them able enough to check signatures and shasums. When reputable people who can tell everyone they were able to reproduce the project s build, others at least have a secondary source of validation.

Distribution work Fr d ric Pierret announced a new testing service at beta.tests.reproducible-builds.org, showing actual rebuilds of binaries distributed by both the Debian and Qubes distributions. In Debian specifically, however, 51 reviews of Debian packages were added, 31 were updated and 31 were removed this month to our database of classified issues. As part of this, Chris Lamb refreshed a number of notes, including the build_path_in_record_file_generated_by_pybuild_flit_plugin issue. Elsewhere in Debian, Roland Clobus posted his Fourth status update about reproducible live-build ISO images in Jenkins to our mailing list, which mentions (amongst other things) that:
  • All major configurations are still built regularly using live-build and bullseye.
  • All major configurations are reproducible now; Jenkins is green.
    • I ve worked around the issue for the Cinnamon image.
    • The patch was accepted and released within a few hours.
  • My main focus for the last month was on the live-build tool itself.
Related to this, there was continuing discussion on how to embed/encode the build metadata for the Debian live images which were being worked on by Roland Clobus.
Ariadne Conill published another detailed blog post related to various security initiatives within the Alpine Linux distribution. After summarising some conventional security work being done (eg. with sudo and the release of OpenSSH version 3.0), Ariadne included another section on reproducible builds: The main blocker [was] determining what to do about storing the build metadata so that a build environment can be recreated precisely . Finally, Bernhard M. Wiedemann posted his monthly reproducible builds status report.

Community news On our website this month, Bernhard M. Wiedemann fixed some broken links [ ] and Holger Levsen made a number of changes to the Who is Involved? page [ ][ ][ ]. On our mailing list, Magnus Ihse Bursie started a thread with the subject Reproducible builds on Java, which begins as follows:
I m working for Oracle in the Build Group for OpenJDK which is primary responsible for creating a built artifact of the OpenJDK source code. [ ] For the last few years, we have worked on a low-effort, background-style project to make the build of OpenJDK itself building reproducible. We ve come far, but there are still issues I d like to address. [ ]

diffoscope diffoscope is our in-depth and content-aware diff utility. Not only can it locate and diagnose reproducibility issues, it can provide human-readable diffs from many kinds of binary formats. This month, Chris Lamb prepared and uploaded versions 183, 184 and 185 as well as performed significant triaging of merge requests and other issues in addition to making the following changes:
  • New features:
    • Support a newer format version of the R language s .rds files. [ ]
    • Update tests for OCaml 4.12. [ ]
    • Add a missing format_class import. [ ]
  • Bug fixes:
    • Don t call close_archive when garbage collecting Archive instances, unless open_archive definitely returned successfully. This prevents, for example, an AttributeError where PGPContainer s cleanup routines were rightfully assuming that its temporary directory had actually been created. [ ]
    • Fix (and test) the comparison of R language s .rdb files after refactoring temporary directory handling. [ ]
    • Ensure that RPM archives exists in the Debian package description, regardless of whether python3-rpm is installed or not at build time. [ ]
  • Codebase improvements:
    • Use our assert_diff routine in tests/comparators/test_rdata.py. [ ]
    • Move diffoscope.versions to diffoscope.tests.utils.versions. [ ]
    • Reformat a number of modules with Black. [ ][ ]
However, the following changes were also made:
  • Mattia Rizzolo:
    • Fix an autopkgtest caused by the androguard module not being in the (expected) python3-androguard Debian package. [ ]
    • Appease a shellcheck warning in debian/tests/control.sh. [ ]
    • Ignore a warning from h5py in our tests that doesn t concern us. [ ]
    • Drop a trailing .1 from the Standards-Version field as it s required. [ ]
  • Zbigniew J drzejewski-Szmek:
    • Stop using the deprecated distutils.spawn.find_executable utility. [ ][ ][ ][ ][ ]
    • Adjust an LLVM-related test for LLVM version 13. [ ]
    • Update invocations of llvm-objdump. [ ]
    • Adjust a test with a one-byte text file for file version 5.40. [ ]
And, finally, Benjamin Peterson added a --diff-context option to control unified diff context size [ ] and Jean-Romain Garnier fixed the Macho comparator for architectures other than x86-64 [ ].

Upstream patches The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:

Testing framework The Reproducible Builds project runs a testing framework at tests.reproducible-builds.org, to check packages and other artifacts for reproducibility. This month, the following changes were made:
  • Holger Levsen:
    • Drop my package rebuilder prototype as it s not useful anymore. [ ]
    • Schedule old packages in Debian bookworm. [ ]
    • Stop scheduling packages for Debian buster. [ ][ ]
    • Don t include PostgreSQL debug output in package lists. [ ]
    • Detect Python library mismatches during build in the node health check. [ ]
    • Update a note on updating the FreeBSD system. [ ]
  • Mattia Rizzolo:
    • Silence a warning from Git. [ ]
    • Update a setting to reflect that Debian bookworm is the new testing. [ ]
    • Upgrade the PostgreSQL database to version 13. [ ]
  • Roland Clobus (Debian live image generation):
    • Workaround non-reproducible config files in the libxml-sax-perl package. [ ]
    • Use the new DNS for the snapshot service. [ ]
  • Vagrant Cascadian:
    • Also note that the armhf architecture also systematically varies by the kernel. [ ]

Contributing If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

1 October 2021

Russell Coker: Getting Started With Kali

Kali is a Debian based distribution aimed at penetration testing. I haven t felt a need to use it in the past because Debian has packages for all the scanning tools I regularly use, and all the rest are free software that can be obtained separately. But I recently decided to try it. Here s the URL to get Kali [1]. For a VM you can get VMWare or VirtualBox images, I chose VMWare as it s the most popular image format and also a much smaller download (2.7G vs 4G). For unknown reasons the torrent for it didn t work (might be a problem with my torrent client). The download link for it was extremely slow in Australia, so I downloaded it to a system in Germany and then copied it from there. I don t want to use either VMWare or VirtualBox because I find KVM/Qemu sufficient to do everything I want and they are in the Main section of Debian, so I needed to convert the image files. Some of the documentation on converting image formats to use with QEMU/KVM says to use a program called kvm-img which doesn t seem to exist, I used qemu-img from the qemu-utils package in Debian/Bullseye. The man page qemu-img(1) doesn t list the types of output format supported by the -O option and the examples returned by a web search show using -O qcow2 . It turns out that the following command will convert the image to raw format which is the format I prefer. I use BTRFS for storing all my VM images and that does all the copy-on-write I need.
qemu-img convert Kali-Linux-2021.3-vmware-amd64.vmdk ../kali
After converting it the file was 500M smaller than the VMWare files (10.2 vs 10.7G). Probably the Kali distribution file could be reduced in size by converting it to raw and then back to VMWare format. The Kali VMWare image is compressed with 7zip which has a good compression ratio, I waited almost 90 minutes for zstd to compress it with -19 and the result was 12% larger than the 7zip file. VMWare apparently likes to use an emulated SCSI controller, I spent some time trying to get that going in KVM. Apparently recent versions of QEMU changed the way this works and therefore older web pages aren t helpful. Also allegedly the SCSI emulation is buggy and unreliable (but I didn t manage to get it going so can t be sure). It turns out that the VM is configured to work with the virtio interface, the initramfs.conf has the configuration option MODULES=most which makes it boot on all common configurations (good work by the initramfs-tools maintainers). The image works well with the Spice display interface, so it doesn t capture my mouse, the window for the VM works the same way as other windows on my desktop and doesn t capture the mouse cursor. I don t know if this level of Spice integration is in Debian now, last time I tested it didn t work that way. I also downloaded Metasploitable [2] which is a VM image designed to be full of security flaws for testing the tools that are in Kali. Again it worked nicely after converting from VMWare to raw format. One thing to note about Metasploitable is that you must not make it available on the public Internet. My home network has NAT for IPv4 but all systems get public IPv6 addresses. It s usually nice that those things just work on VMs but not for this. So I added an iptables command to block IPv6 to /etc/rc.local. Conclusion Installing VMs for both these distributions was quite easy. Most of my time was spent downloading from a slow server, trying to get SCSI emulation working, working out how to convert image files, and testing different compression options. The time spent doing stuff once I knew what to do was very small. Kali has zsh as the default shell, it s quite nice. I ve been happy with bash for decades, but I might end up trying zsh out on other machines.

12 September 2021

Russ Allbery: DocKnot 5.00

This release is the culmination of a project that I've been wanting to do for two years, but nearly all the work was done in the past week. That experience made me rethink some of my skepticism, but I'll get to that part of the story later. In March of 1999, I got tired of writing HTML by hand and wrote a small program called spin that implemented a macro language that translated into HTML. This makes it one of the oldest programs for which I have a continuous development history, predating podlators by three months. I think only News::Gateway (now very dormant) and Term::ANSIColor (still under active development but very stable) are older, as long as I'm not counting orphaned packages like newsyslog. I've used spin continuously ever since. It's grown features and an ecosystem of somewhat hackish scripts to do web publishing things I've wanted over the years: journal entries like this one, book reviews, a simple gallery (with some now-unfortunate decisions about maximum image size), RSS feeds, and translation of lots of different input files into HTML. But the core program itself, in all those years, has been one single Perl script written mostly in my Perl coding style from the early 2000s before I read Perl Best Practices. My web site is long overdue for an overhaul. Just to name a couple of obvious problems, it looks like trash on mobile browsers, and I'm using URL syntax from the early days of the web that, while it prompts some nostalgia for tildes, means all the URLs are annoyingly long and embed useless information such as the fact each page is written in HTML. Its internals also use a lot of ad hoc microformats (a bit of RFC 2822 here, a text-based format with significant indentation there, a weird space-separated database) and are supported by programs that extract meaning from human-written pages and perform automated updates to them rather than having a clear separation between structure and data. This will be a very large project, but it's the sort of quixotic personal project that I enjoy. Maintaining my own idiosyncratic static site generator is almost certainly not an efficient use of my time compared to, say, converting everything to Hugo. But I have 3,428 pages (currently) written in the thread macro language, plus numerous customizations that cater to my personal taste and interests, and, most importantly, I like having a highly customized system that I know exactly how to automate. The blocker has been that I didn't want to work on spin as it existed. It badly needed a structural overhaul and modernization, and even more badly needed a test suite since every release involved tedious manual testing by pouring over diffs between generations of the web site. And that was enough work to be intimidating, so I kept putting it off. I've separately been vaguely aware that I have been spending too much time reading Twitter (specifically) and the news (in general). It would be one thing if I were taking in that information to do something productive about it, but I haven't been. It's just doomscrolling. I've been thinking about taking a break for a while but it kept not sticking, so I decided to make a concerted effort this week. It took about four days to stop wanting to check Twitter and forcing myself to go do something else productive or at least play a game instead. Then I managed to get started on my giant refactoring project, and holy shit, Twitter has been bad for my attention span! I haven't been able to sustain this level of concentration for hours at a time in years. Twitter's not the only thing to blame (there are a few other stressers that I've fixed in the past couple of years), but it's obviously a huge part. Anyway, this long personal ramble is prelude to the first release of DocKnot that includes my static site generator. This is not yet the full tooling from my old web tools page; specifically, it's missing faq2html, cl2xhtml, and cvs2xhtml. (faq2html will get similar modernization treatment, cvs2xhtml will probably be rewritten in Perl since I have some old, obsolete scripts that may live in CVS forever, and I may retire cl2xhtml since I've stopped using the GNU ChangeLog format entirely.) But DocKnot now contains the core of my site generation system, including the thread macro language, POD conversion (by way of Pod::Thread), and RSS feeds. Will anyone else ever use this? I have no idea; realistically, probably not. If you were starting from scratch, I'm sure you'd be better off with one of the larger and more mature static site generators that's not the idiosyncratic personal project of one individual. It is packaged for Debian because it's part of the tool chain for generating files (specifically README.md) that are included in every package I maintain, and thus is part of the transitive closure of Debian main, but I'm not sure anyone will install it from there for any other purpose. But for once making something for someone else isn't the point. This is my quirky, individual way to maintain web sites that originated in an older era of the web and that I plan to keep up-to-date (I'm long overdue to figure out what they did to HTML after abandoning the XHTML approach) because it brings me joy to do things this way. In addition to adding the static site generator, this release also has the regular sorts of bug fixes and minor improvements: better formatting of software pages for software that's packaged for Debian, not assuming every package has a TODO file, and ignoring Autoconf 2.71 backup files when generating distribution tarballs. You can get the latest version of DocKnot from CPAN as App-DocKnot, or from its distribution page. I know I haven't yet updated my web tools page to reflect this move, or changed the URL in the footer of all of my pages. This transition will be a process over the next few months and will probably prompt several more minor releases.

5 September 2021

Reproducible Builds: Reproducible Builds in August 2021

Welcome to the latest report from the Reproducible Builds project. In this post, we round up the important things that happened in the world of reproducible builds in August 2021. As always, if you are interested in contributing to the project, please visit the Contribute page on our website.
There were a large number of talks related to reproducible builds at DebConf21 this year, the 21st annual conference of the Debian Linux distribution (full schedule):
PackagingCon (@PackagingCon) is new conference for developers of package management software as well as their related communities and stakeholders. The virtual event, which is scheduled to take place on the 9th and 10th November 2021, has a mission is to bring different ecosystems together: from Python s pip to Rust s cargo to Julia s Pkg, from Debian apt over Nix to conda and mamba, and from vcpkg to Spack we hope to have many different approaches to package management at the conference . A number of people from reproducible builds community are planning on attending this new conference, and some may even present. Tickets start at $20 USD.
As reported in our May report, the president of the United States signed an executive order outlining policies aimed to improve the cybersecurity in the US. The executive order comes after a number of highly-publicised security problems such as a ransomware attack that affected an oil pipeline between Texas and New York and the SolarWinds hack that affected a large number of US federal agencies. As a followup this month, however, a detailed fact sheet was released announcing a number large-scale initiatives and that will undoubtedly be related to software supply chain security and, as a result, reproducible builds.
Lastly, We ran another productive meeting on IRC in August (original announcement) which ran for just short of two hours. A full set of notes from the meeting is available.

Software development kpcyrd announced an interesting new project this month called I probably didn t backdoor this which is an attempt to be:
a practical attempt at shipping a program and having reasonably solid evidence there s probably no backdoor. All source code is annotated and there are instructions explaining how to use reproducible builds to rebuild the artifacts distributed in this repository from source. The idea is shifting the burden of proof from you need to prove there s a backdoor to we need to prove there s probably no backdoor . This repository is less about code (we re going to try to keep code at a minimum actually) and instead contains technical writing that explains why these controls are effective and how to verify them. You are very welcome to adopt the techniques used here in your projects. ( )
As the project s README goes on the mention: the techniques used to rebuild the binary artifacts are only possible because the builds for this project are reproducible . This was also announced on our mailing list this month in a thread titled i-probably-didnt-backdoor-this: Reproducible Builds for upstreams. kpcyrd also wrote a detailed blog post about the problems surrounding Linux distributions (such as Alpine and Arch Linux) that distribute compiled Python bytecode in the form of .pyc files generated during the build process.

diffoscope diffoscope is our in-depth and content-aware diff utility. Not only can it locate and diagnose reproducibility issues, it can provide human-readable diffs from many kinds of binary formats. This month, Chris Lamb made a number of changes, including releasing version 180), version 181) and version 182) as well as the following changes:
  • New features:
    • Add support for extracting the signing block from Android APKs. [ ]
    • If we specify a suffix for a temporary file or directory within the code, ensure it starts with an underscore (ie. _ ) to make the generated filenames more human-readable. [ ]
    • Don t include short GCC lines that differ on a single prefix byte either. These are distracting, not very useful and are simply the strings(1) command s idea of the build ID, which is displayed elsewhere in the diff. [ ][ ]
    • Don t include specific .debug-like lines in the ELF-related output, as it is invariably a duplicate of the debug ID that exists better in the readelf(1) differences for this file. [ ]
  • Bug fixes:
    • Add a special case to SquashFS image extraction to not fail if we aren t the superuser. [ ]
    • Only use java -jar /path/to/apksigner.jar if we have an apksigner.jar as newer versions of apksigner in Debian use a shell wrapper script which will be rejected if passed directly to the JVM. [ ]
    • Reduce the maximum line length for calculating Wagner-Fischer, improving the speed of output generation a lot. [ ]
    • Don t require apksigner in order to compare .apk files using apktool. [ ]
    • Update calls (and tests) for the new version of odt2txt. [ ]
  • Output improvements:
    • Mention in the output if the apksigner tool is missing. [ ]
    • Profile diffoscope.diff.linediff and specialize. [ ][ ]
  • Logging improvements:
    • Format debug-level messages related to ELF sections using the diffoscope.utils.format_class. [ ]
    • Print the size of generated reports in the logs (if possible). [ ]
    • Include profiling information in --debug output if --profile is not set. [ ]
  • Codebase improvements:
    • Clarify a comment about the HUGE_TOOLS Python dictionary. [ ]
    • We can pass -f to apktool to avoid creating a strangely-named subdirectory. [ ]
    • Drop an unused File import. [ ]
    • Update the supported & minimum version of Black. [ ]
    • We don t use the logging variable in a specific place, so alias it to an underscore (ie. _ ) instead. [ ]
    • Update some various copyright years. [ ]
    • Clarify a comment. [ ]
  • Test improvements:
    • Update a test to check specific contents of SquashFS listings, otherwise it fails depending on the test systems user ID to username passwd(5) mapping. [ ]
    • Assign seen and expected values to local variables to improve contextual information in failed tests. [ ]
    • Don t print an orphan newline when the source code formatting test passes. [ ]

In addition Santiago Torres Arias added support for Squashfs version 4.5 [ ] and Felix C. Stegerman suggested a number of small improvements to the output of the new APK signing block [ ]. Lastly, Chris Lamb uploaded python-libarchive-c version 3.1-1 to Debian experimental for the new 3.x branch python-libarchive-c is used by diffoscope.

Distribution work In Debian, 68 reviews of packages were added, 33 were updated and 10 were removed this month, adding to our knowledge about identified issues. Two new issue types have been identified too: nondeterministic_ordering_in_todo_items_collected_by_doxygen and kodi_package_captures_build_path_in_source_filename_hash. kpcyrd published another monthly report on their work on reproducible builds within the Alpine and Arch Linux distributions, specifically mentioning rebuilderd, one of the components powering reproducible.archlinux.org. The report also touches on binary transparency, an important component for supply chain security. The @GuixHPC account on Twitter posted an infographic on what fraction of GNU Guix packages are bit-for-bit reproducible: Finally, Bernhard M. Wiedemann posted his monthly reproducible builds status report for openSUSE.

Upstream patches The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including: Elsewhere, it was discovered that when supporting various new language features and APIs for Android apps, the resulting APK files that are generated now vary wildly from build to build (example diffoscope output). Happily, it appears that a patch has been committed to the relevant source tree. This was also discussed on our mailing list this month in a thread titled Android desugaring and reproducible builds started by Marcus Hoffmann.

Website and documentation There were quite a few changes to the Reproducible Builds website and documentation this month, including:
  • Felix C. Stegerman:
    • Update the website self-build process to not use the buster-backports suite now that Debian Bullseye is the stable release. [ ]
  • Holger Levsen:
    • Add a new page documenting various package rebuilder solutions. [ ]
    • Add some historical talks and slides from DebConf20. [ ][ ]
    • Various improvements to the history page. [ ][ ][ ]
    • Rename the Comparison protocol documentation category to Verification . [ ]
    • Update links to F-Droid documentation. [ ]
  • Ian Muchina:
    • Increase the font size of titles and de-emphasize event details on the talk page. [ ]
    • Rename the README file to README.md to improve the user experience when browsing the Git repository in a web browser. [ ]
  • Mattia Rizzolo:
    • Drop a position:fixed CSS statement that is negatively affecting with some width settings. [ ]
    • Fix the sizing of the elements inside the side navigation bar. [ ]
    • Show gold level sponsors and above in the sidebar. [ ]
    • Updated the documentation within reprotest to mention how ldconfig conflicts with the kernel variation. [ ]
  • Roland Clobus:
    • Added a ticket number for the issue with the live Cinnamon image and diffoscope. [ ]

Testing framework The Reproducible Builds project runs a testing framework at tests.reproducible-builds.org, to check packages and other artifacts for reproducibility. This month, the following changes were made:
  • Holger Levsen:
    • Debian-related changes:
      • Make a large number of changes to support the new Debian bookworm release, including adding it to the dashboard [ ], start scheduling tests [ ], adding suitable Apache redirects [ ] etc. [ ][ ][ ][ ][ ]
      • Make the first build use LANG=C.UTF-8 to match the official Debian build servers. [ ]
      • Only test Debian Live images once a week. [ ]
      • Upgrade all nodes to use Debian Bullseye [ ] [ ]
      • Update README documentation for the Debian Bullseye release. [ ]
    • Other changes:
      • Only include rsync output if the $DEBUG variable is enabled. [ ]
      • Don t try to install mock, a tool used to build Fedora packages some time ago. [ ]
      • Drop an unused function. [ ]
      • Various documentation improvements. [ ][ ]
      • Improve the node health check to detect zombie jobs. [ ]
  • Jessica Clarke (FreeBSD-related changes):
    • Update the location and branch name for the main FreeBSD Git repository. [ ]
    • Correctly ignore the source tarball when comparing build results. [ ]
    • Drop an outdated version number from the documentation. [ ]
  • Mattia Rizzolo:
    • Block F-Droid jobs from running whilst the setup is running. [ ]
    • Enable debugging for the rsync job related to Debian Live images. [ ]
    • Pass BUILD_TAG and BUILD_URL environment for the Debian Live jobs. [ ]
    • Refactor the master_wrapper script to use a Bash array for the parameters. [ ]
    • Prefer YAML s safe_load() function over the unsafe variant. [ ]
    • Use the correct variable in the Apache config to match possible existing files on disk. [ ]
    • Stop issuing HTTP 301 redirects for things that not actually permanent. [ ]
  • Roland Clobus (Debian live image generation):
    • Increase the diffoscope timeout from 120 to 240 minutes; the Cinnamon image should now be able to finish. [ ]
    • Use the new snapshot service. [ ]
    • Make a number of improvements to artifact handling, such as moving the artifacts to the Jenkins host [ ] and correctly cleaning them up at the right time. [ ][ ][ ]
    • Where possible, link to the Jenkins build URL that created the artifacts. [ ][ ]
    • Only allow only one job to run at the same time. [ ]
  • Vagrant Cascadian:
    • Temporarily disable armhf nodes for DebConf21. [ ][ ]

Lastly, if you are interested in contributing to the Reproducible Builds project, please visit the Contribute page on our website. You can get in touch with us via:

Next.

Previous.